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Preface 



Norman L. Webb 
University of Wisconsin-Madison 



Ten years ago the National Council of 
Teachers of Mathematics released the first set 
of K-12 national content standards. Over the 
past decade, standards have been developed 
for most other content areas. Now nearly all 
of the states have content standards and 
assessments for mathematics, science, and 
language arts. The advancement of systemic 
reform has coincided with this massive effort 

on the part of states and districts to describe 
and assess more clearly what students should 
be able to know and to do in a multiplicity of 
content areas. Coinciding and closely linked 
with standards-based reforms, systemic 

reform has evolved from the theory developed 
by Smith and O’Day in 1991 into practice as a 
change strategy for surmounting the difficult 
problem of enabling all students to meet 
challenging content standards. 

A national forum on evaluating systemic 
reform is both timely and necessary at this 
crucial point in the advancement of system- 
wide improvement. After a decade of 
experience, research studies, evaluations, and 

reflection, we have a considerable amount of 
information on attempts towards systemic 
reform and its evaluation. A spectrum of 
models of systemic reform that varies widely 
in the degree of success emerges from this 
information. The National Science Foundation 
(NSF) has spent hundreds of millions of 
dollars on systemic initiatives; now under 
pressure, Government Performance and 
Results Act (GPRA) personnel are seeking 
hard evidence of what the true impact of its 
massive effort to improve science and 
mathematics student performance has been. 

The National Institute for Science Education 
(N1SE) Forum on the Evaluation of Systemic 
Reform in Mathematics and Science has two 
purposes. The first is for us to reflect on what 
we understand about the evaluation of reform 
in education systems. The second is to 
encourage and support continuing efforts to 



learn more about how evaluation can serve the 
multiple analytic needs in systemic reform for 
accountability, efficiency, and decision- 
making. See Appendix B for a summary 
evaluation of the Forum-based evaluations 
completed by Forum participants 

Our attention at this Forum and the work 
of NISE in studying systemic reform focuses 
on reform in mathematics and science. We 
acknowledge the important interactions of 
mathematics and science with other content 
areas and do not want the limiting of our 
focus to these two content areas to be 
interpreted as ignoring the value of other 
content areas. We have restricted our attention 
to mathematics and science because of the 
mission of the National Science Foundation 
and the benefits for studying reform with a 
content-specific approach. By attending to 
mathematics and science, we can build on the 
significant research that has been conducted 
on teaching and learning in these content 
areas. We can more easily trace activity 
through systems and find the connections 
among policy, administration, curriculum, and 
learning by focusing on these content areas. 
Systemic reform only in mathematics and 

science, however, is insufficient for full 
systemic reform. Thus, what we learn from 
evaluating systemic reform in mathematics 
and science will be relevant to the evaluation 
of related reform in any content area and to 
systemic reform in general. 

A cornerstone of systemic reform is the 
establishment of high standards and a 
commonly shared vision or image of an 
idealized education system (Smith & 0 Day, 
1991). More traditional reforms focus on a 
single component or unit and incremental 
change, whereas systemic reform considers all 
of the components, their interactions with 

each other, and their alignment in attaining 
common goals. In theory, school-based 

reform, curriculum reform, and other 
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singularly focused reform initiatives will be 
insufficient to sustain an effort to attain 
significant improvement in student learning 
without attending to other system 
components. Those successes that can be 
achieved through school-based reform will be 
deterred or inhibited by shifts in policy 
through state and district mandates or a 
diminishing teaching force of knowledgeable 
and well-trained teachers. Standards-based 
reform is important to a systemic reform, but 
does not imply that the reform is directed 
toward systemic change. Other components 
within the system, such as professional 
development, accountability, teacher 
preparation, and resource allocation, need to 
be addressed to achieve standards-based 
systemic reform. A state or district education 
system will make progress towards systemic 
reform when policies, administration, 
teaching, and curriculum are working in 
concert with each other in an effort directed 
toward promoting improved learning of 
challenging content by all students. The 
NSF s six critical drivers describe the 
components of a successful systemic reform 
process: 

. An array of evidence that the reform 

has enhanced student performance in 
challenging mathematics and science 
material. 

. Promotion of improved achievement 

by all students in the system. 

. Implementation of a comprehensive, 

standards-based curriculum supported 
by needed professional development 
and assessment practices. 



. Development of a coherent and 

consistent set of policies that supports 
educational systemic reform. 

» Convergence of all resources to 

support the systemic reform through a 

focused and unitary strategy. 

. Broad-based support from all 

segments of the community. 

Over the past four years, the NISE 
systemic reform team has studied system 
reform and its evaluation. We have interacted 
on a number of occasions with those who 
were doing the evaluations of systemic 
initiatives and systemic reform. We have tried 
first to illuminate what the questions are that 
we should be asking about the evaluation of 
systemic reform. During our exploration of 
these issues, we mined the evaluation 
literature and talked to those who • were trying 
to evaluate systemic reform. Then, we studied, 
specific strategies and approaches for 
conducting evaluations of systemic reform in 
mathematics and science. 

Out of this work we have developed a basic 
understanding of the evaluation of systemic 
reform. That process continued at the 1999 
Forum. 
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CHALLENGES TO EVALUATING SYSTEMIC REFORM 



Norman L. Webb 

National Institute for Science Education 



Welcome to this NISE Forum on 
Evaluating Systemic Reform. I am excited 
about the line-up of speakers and the diversity 
and the depth of experience all of you bring to 
this most important issue. Our conference 
format, which has evolved over a number of 
years, has worked well for enabling speakers 
to raise stimulating ideas that are discussed, 
dissected, and added to in the small-group 
discussions. 

Evaluation of systemic reform is one of 
the crucial issues facing education today. It is 
like solving a giant jigsaw puzzle without the 
aid of the picture on the cover. Only through 
effective use of information and the close 
scrutiny of evaluation studies can 
improvements in our education systems be 
documented and understood. Without accurate 
and informative studies of the reform process, 
we face the prospect of repeating failures, 
acting without any sense of progress, and 
being subject to repeated whiplash from the 
onslaught of political and educational fads. 

Systemic reform is one of the most 
innovative, massive, and ambitious attempts 
at education improvement our country has 
experienced since the curricula reform and 
Great Society era of the 1960s. The systemic 
initiatives of the National Science Foundation 
(NSF) have been a bold risk venture to 
improve science and mathematics education. 
They confront directly our society s needs for 
a strong economy and informed citizens. 
Congressman Vernon Ehlers, who will speak 
to us at the reception this afternoon, chaired a 
committee that in September released a 
Congressional report entitled, Unlocking Our 
Future: Toward a New National Science 
Policy. I recommend that all of you read this 
report, which is available on the Web. We 
also have copies on the display tables. Ehlers 
committee was charged with developing a 
long-range science and technology policy for 
the nation. This significant document 



recognizes the vital role that education must 
play in this process. I quote from the report: 

Our system of education, from 
kindergarten to research universities, must 
be strengthened. Our effectiveness in 
realizing the vision [to maintain and 
improve our country s pre-eminent 
position in science and technology] will 
be largely determined by the intellectual 
capital of the Nation. Education is critical 
to developing this resource. 

Mathematics, science, and technology 
continue to advance at a rapid pace. For 
education to produce the intellectual capital to 
maintain our nation s economic strength 
requires that our schools do things they have 
never done before. Our schools are challenged 
to teach a more diverse population than ever 
before. They are required to teach somewhat 
different mathematics and science that has 
never before been taught on such a large 
scale. And, schools are asked to do this in a 
rapidly advancing technological environment. 

Complacency can breed mediocrity. 
Ignoring the reservoir of untapped talent of 
the under-served in education, and having 
students be any less than they can be weighs 
down our society and creates lethargy and 
failure. Maya Angelou, in a recent address to 
the Wisconsin teachers, encouraged them to 
teach each youth as if she or he is the next 
Einstein, Andrew Wiles, Madame Curie, or 
Bill Gates. Gloria Ladson-Billings reminds us 
that what students can leam is not 
predetermined. 

Ten years ago, systemic reform was a 
topic found only in books. We now have 
nearly a decade of learning and experience 
about how education systems have tried to 
advance large-scale change. These 
experiences have breathed life into this 
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theoretical vision toward change. Simply 
stated, systemic reform is 

a process that extends over a long period 
of time and that has to engage a number 
of people in system improvement through 
changing multiple system components and 
their interconnections concurrently. 

Systemic reform in education does not 
imply uniform practice nor the prevention of 
innovation. It does not imply only one 
strategy for change. Nor does it imply that 
there has to be a strong centralized system 
rather than a more locally controlled system. 

It does imply that a system needs to add 
greater stability, improve alignment, remove 
barriers and countervailing forces, create 
stronger links among components, and work 
with all teachers so that all students will have 
the chance to obtain knowledge of important 
science and mathematics. 

Nobody said reform is easy. I draw 
strength from what Neil Postman wrote in the 
1970s in his book, Teaching as a Conserving 
Activity. What makes education resistant to 
change by those who are well meaning and 
have the knowledge of what should be also 
inoculates education from the destructive 
viruses of fools and ill-placed quick fixes, and 
charlatans. 

An important role of evaluation is to 
generate models and conceptualizations of 
what is being evaluated. Many of the 
evaluators present here have advanced their 
models of systems and systemic reform, 
including the SRI (Stanford Research 
Institute) pyramid to name one example. A 
very simple model of an education system 
consists of four general components-policy, 
management, programs, and student 
outcomes. These components and their 
functions do not reside at any one level such 
as the state level, school level, or classroom 
level, but incorporate and exist at all of these 
levels. Clearly, other components could be 
added to this simple model, such as the 
community. 

What has distinguished systemic reform 
from other types of reforms is that other 
reforms have focused primarily on change in 
one and only one of the components. 



Curriculum initiatives address the total 
program. School-based decision making 
primarily attends to management. State 
legislation that imposes a graduation 
requirement exists within the policy arena. 

Many of the non-systemic reform theories 
of change are generally linear and uni- 
directional. [Slide 4] Change in policy effects 
a change in management that effects a change 
in curriculum and instruction that then is to 
result in improved student achievement. 

Systemic reform is based on an 
assumption that the system components are 
interconnected, non-linear, complex, and 
adaptive. [Slide 5] Each of the components 
have an influence on all of the other 
components. As such, education systems are 
better represented more as an ecology than an 
assembly line. The conceptualization of the 
system and the approach to change has strong 
implications for the evaluation and study of 
the system. For example, in systemic reform, 
student achievement is not only an outcome 
variable, but is also both an input variable and 
a process variable. 

As an outcome variable, levels of student 
achievement are specified goals and indicators 
of student learning. As an input variable, 
information on student achievement is used to 
make decisions about program, management, 
and policy. As a process variable, measures of 
student achievement communicate 
expectations, influence the enacted 
curriculum, are essential for system 
alignment, and define the performance gap 
between groups. The multiple roles of each 
system component have important 
implications for evaluation. It is not sufficient 
for an evaluator to monitor only student 
learning. Evaluators of systemic reform also 
need to understand how those in the system 
make decisions based on student achievement. 

As 1 think about the evaluation of 
systemic reform, I think about Umberto Eco s 
essay on the theoretical possibility of creating 
a map of the empire on a scale of 1 to 1. The 
map has to represent each feature of the 
empire exactly, but cannot be placed over the 
empire being mapped because then the 
climate would be affected, causing a change 
in the terrain and forcing another change in 
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the map. Eco ends his essay with two 
concluding corollaries: 

Every 1: 1 map always reproduces 
the territory unfaithfully. 

At the moment the map is realized, 
the empire becomes unreproducible. 

Our hope in evaluating systemic reform is 
not to represent everything faithfully. To do 
so would imply the lack of a dynamic system, 
or a record of what does not exist anymore. 
Our charge is to seek data on key attributes 
that will maximize the information we can use 
to understand the extent and quality of reform. 

Evaluation of systemic reform involves 
practical research that calls upon multiple 
tools, strategies, and knowledge bases. 

Polarized positions, such as quantitative not 
qualitative, policy research not practical 
research, reform not traditional, understanding 
not drill, have no place in the evaluation of 
systemic reform. All of these facets have a 
role and need to be considered. All of these 
techniques and other techniques and views 
have to be considered in context and as 
context. 

One important function for evaluation is 
to describe what is happening, what has 
happened, and what will happen. From SRI s 
evaluation of the state systemic initiative 
program, we have important descriptive 
information such as the five main 
implementation strategies used by state 
systemic initiatives (Sis). We also know that 
over one third of the middle grade 
mathematics and science teachers in the SI 
states had participated over the first four years 
of the effort. 

A second important function for 
evaluation is to judge and to verify systemic 
reform s value as a reform strategy. Its value 
needs to be established in the context of at 
least three currencies-in relation to the 
theory of systemic reform, in relation to the 
goals of each system s reform, and in relation 
to alternative strategies. Does the reform 
create better alignment in the system? Has the 
gap in performance among groups been 
reduced while raising overall student 
achievement? Has professional development 



improved the capacity of teachers to provide 
quality instruction in mathematics and 
science? Has systemic reform led to more 
significant and sustained change than would 
have been achieved through allocating all of 
the funds for reform to the purchase of new 
curriculum materials? 

Evaluations of systemic reform need to 
judge the value and worth of reform in a 
larger context. Clearly, we seek evidence of 
improved student learning. We also need to 
seek other significant outcomes and payoffs 
related to investment in a risk venture. 
Sometimes payoffs come in the form of new 
inventions, transportable innovations, and 
system learning. A small investment of .1% of 
an education budget may not produce the 
targeted goals, but may result in a product 
such as Tang or Velcro. The recently released 
report on the New National Science Policy 
points to federally supported research on the 
molecular mechanisms of DNA, the so-called 
blueprint of life, that led to recombinant 
DNA technology (gene splicing), which in 
turn spawned an entire industry. An 
important function of the evaluation of 
systemic reform is to identify the innovations, 
the spin-offs, and the learning that take place. 
One unexpected derivative of the Puerto Rico 
systemic initiative has been the development 
of the entrepreneurial function of schools. The 
schools, funded by the state systemic initiative 
for only a limited amount of time, had to 
develop management and marketing skills to 
seek continuing funding from other sources 
within the community. This is an important 
finding, but it also needs to be analyzed to 
determine whether it is a viable strategy that 
will prove, for example, to be a sustained 
source of resources for needed, on-going 
professional development of teachers. What is 
the evidence that schools as entrepreneurial 
enterprises will meet the challenges of reform 
while not taking away teachers needed time 
and energy in performing the important 
function of educating students? Where is the 
balance and how do teachers and principals 
reach a suitable compromise? 

A third function of evaluation is to 
explain. One value of science is to explain 
physical phenomena, the structure and the 



ERIC 



14 



3 



compositions of galaxies, the genetic make-up 
of living creatures, and the interaction of 
people. As in physics, astronomy, biology, 
and psychology, one value of systemic reform 
evaluation is to explain how reform leads to a 
large percentage of students achieving 
challenging and high quality mathematics and 
science. Through clear explanations of the 
link between reform efforts, classroom 
practices, and student activities, we come to 
understand how reform contributes to 
improved student learning. One form of 
explanation is to isolate the primary reasons 
for an impact, thus connecting an effect to a 
particular cause. Another form of explanation 
is by eliminating alternative hypotheses and 
increasing confidence in conclusions and the 
cause of an event. Explaining why change has 
occurred through systemic reform evaluation 
is more of the latter than the former.' 

A fourth function of systemic reform 
evaluation is to offer, or participate in, 
recommended changes toward improvement 
in the direction and the nature of reform. 
Michael Scriven, in his Hard-Wott Lessons in 
Program Evaluation (1993), does not fully 
accept this position and argues that evaluators 
should steer toward evaluative conclusions, 
but are generally in a weak position to offer 
recommendations. Understanding the logic of 
a large education system requires significant 
effort by an evaluator, or anyone else. When 
the longevity of district superintendents can 
be as little as two or three years, systemic 
initiative directors come and go with 
frequency, and NSF staff continually rotate, 
the evaluators of systemic initiatives have 
become the one constant. The evaluator who 
understands what is happening becomes a 
critical source of information and insight. I 
believe that not only should the evaluator at 
least participate in drawing up 
recommendations and setting new courses of 
action, but that the evaluator is ethically 
required to do so. 

Evaluators of systemic reform in 
mathematics and science face numerous 
challenges. Many of the speakers at this 
Forum will address some of these both in their 
papers and in their presentations. It is the 
challenges that make evaluation of systemic 



reform so interesting. The size and complexity 
of large district and state systems force us to 
break the problem into smaller parts and to 
conduct a series of coordinated studies. The 
dynamism of education systems forces 
evaluators to develop iterative plans that have 
to be periodically updated and refined, as 
more information is gathered and more 
knowledge about the system and the progress 
of reform are gained. The need for reform to 
saturate the system, to go to scale, and to 
leverage resources forces evaluators to 
consider the whole system as the unit of 
analysis. Evaluators need to understand what 
Jane Kahle has called the pressure points 
within the system. They also need to analyze 
and judge the viability of strategies and 
theories of change to address the full problem. 
Summary information, such as mean test 
scores, is insufficient as a basis for decisions 
on systemic reform. More detailed 
information is essential. An important 
challenge for evaluators is to gain access to 
what information is needed to understand 
reform. 1 am more convinced than ever, from 
our work with Milwaukee Public Schools, that 
embedded evaluation- where evaluators 
work interactively with those in the system in 
addressing problems-is necessary to gain a 
deep understanding of what the system is and 
what is necessary to seek reform. Finally, any 
study of systemic reform needs to consider the 
time frame within which significant change is 
to be attained. Forcing judgment on the 
significant progress of reform and change in 
student learning after only one or two years 
ignores entirely the complexity of education 
systems and what is required to penetrate 
them in order to achieve sustained 
improvement. 

A small group of us have been engaged in 
writing a book on the evaluation of systemic 
reform. This group, which includes Dan Heck, 
Jeannie Rose Century, Norma Dlvila, Eric 
Osthoff, and myself, has built on the work of 
several NISE Fellows and the many others 
who have shared their experiences with us. A 
centerpiece of the book is a section describing 
the nine attributes that are important in 
considering the evaluation of systemic reform. 
We have clustered these attributes as they 



O 

ERIC 



15 



relate to systemic reform and in contrast to 
other types of reform. Five target attributes, 
those that are essential if systemic reform is 
to be achieved, are alignment, saturation, 
linkages, -equity, and quality. Two enabling 
attributes represent features of the system that 
need to be changed to lay the foundation for ' 
systemic reform. Two other enabling 
attributes are capacity and sustainability. The 
explanatory attributes, incentives and trade- 
offs, help to explain or provide reasons for the 
advancement, or non-advancement, of 
systemic reform. Confronted with complex 
challenges, such collaboration is essential. 

Andrew Wiles-as a case in point — 
accomplished one of the most astonishing 
intellectual endeavors in our time when, in 
1994, he proved Fermat s Last Theorem. For 
seven years, he devoted himself to the 
solution of a problem that had evaded the 
grasp of the greatest mathematicians for 350 
years. Wiles first read about Fermat s theorem 
when he was ten years old. At the age of 41, 
he accomplished his boyhood dream. In so 
doing, he demonstrated the importance of 
building on the work of others from all 
comers of the world. Raised in Cambridge, 
England, he did his work at Princeton 
University. Wiles used the work of two 
Japanese-Taniyama and Shimura-along 
with that of Galois (which Wiles studied when 
he was a teenager in France), the work of a 
University of California at Berkeley 
mathematician Ken Ribet, and many more. 
After three years of non-stop work, Wiles 
attended a major conference on elliptic 
equations, where he learned about a method 
first devised by Kolyvagin, a Russian 



mathematician. This method proved to be an 
important key to developing the proof. 

Those of you who have come to this NISE 
Forum on Evaluating Systemic Reform have 
an opportunity to interact and leam from those 
who have been at the center of systemic 
reform evaluation for several years. These are 
the people who have performed nearly all of 
the work in evaluating systemic reform, or 
who have worked with those who have led the 
effort to evaluate systemic reform. 

An important goal of this Forum is to 
provide you and others an opportunity to leam 
from each other and to leam how others are 
thinking about the problem. Who knows, 
maybe you will hear about an approach that 
will be the key to how best to study systemic 
reform [Final Slide]-An approach that will 
help us better to describe, explain, judge, and 
recommend how the pieces of the puzzle fit 
together. Hopefully, though, none of you will 
write in the margin, I have a truly marvelous 
demonstration of evaluation of systemic 
reform, which this margin is too narrow to 
contain without providing a full disclosure of 
what you have discovered. 
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Panel I.: Understanding Evaluation of Systemic Reform 



Panel Papers and Authors: 

The Detroit Urban Systemic Initiative: A Promising View of Systemic Reform 
Juanita Clay Chambers, Detroit Public Schools 
Understanding Evaluation of Systemic Reform: Purposes and Vision for Evaluation 
Daniel Heck, University of Illinois at Urbana-Champaign 
Tracking the Theory of Change: A Moving Target 
Zoe A. Barley, Western Michigan University 

Evaluating systemic Reform: A Complex Endeavor 
Iris R. Weiss, Horizon Research Inc. 



Discussion Summary and Commentary: 
Understanding Evaluation of Systemic 
Reform 

Norman L. Webb 

Panel 1 set the stage for the other panels 
by providing a context for evaluating systemic 
reform. The four presenters discussed efforts 
towards systemic reform, reasons for 
engaging in the evaluation of systemic 
reform; conceptualizing a systemic initiative, 
and complexity involved in evaluating 
systemic reform. 

Juanita Clay Chambers, a staff member of 
Detroit Public Schools and its Urban Systemic 
Initiative (DUSI), set the stage for discussing 
the evaluation of systemic reform. She 
described the challenges the Detroit Urban 
Systemic Initiative faces in the 10th largest 
school district in the country with limited 
resources, a high proportion of students from 
economically challenged families, a large and. 
diverse group of teachers, and varying degrees 
of community support. Measuring the 
progress of the DUSI was viewed as 
important because the data challenged the 
initiative to consider the efficiency and 
effectiveness of its strategies. DUSI was 
designed to reach mathematics and science 
instruction in all schools and with all students 
in the district over five years. The school 
building was identified as the unit of change. 

A standards-based core curriculum for 
mathematics and science and professional 



development in constructivist teaching were 
the main vehicles of reform. Two research 
questions drove the evaluation of the DUSI: Is 
the initiative effective in significantly 
improving student achievement in 
mathematics and science and is the initiative 
effective in producing a system that supports 
and sustains improved student achievement 
over time? A multi-method design was used 
for the evaluation. In-depth case studies were 
conducted in six schools, equally divided 
among the first two tiers of the three tiers of 
schools. Surveys were conducted in 54 
schools randomly selected among the three 
tiers. All the mathematics and science 
teachers in each school sampled completed a 
survey. Findings from the evaluation 
supported an increase in the number of 
teachers engaged in professional development 
in mathematics and science; a higher number 
of students enrolled in advanced mathematics 
and science courses; and improved scores at 
all grade levels on the state criterion- 
referenced test and the district norm- 
referenced test. 

Daniel Heck, at the time of his 
presentation a graduate student at the 
University of Illinois at Urbana-Champaign, 
outlined purposes for doing evaluations of 
systemic reform. He emphasized that systemic 
reform is a theory of change to move an 
education system toward an ambitious vision 
of learning. Evaluations of such ambitious 
efforts have to be equal to their task. Three 
developmental issues education systems face 
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define the meaningful purposes for evaluation 
of systemic reform: (1) the need to understand 
and manage change throughout a large, 
complex system; (2) the need to track the 
nature and extent of change over time; and (3) 
the need to build and test a theory of systemic 
reform operating in the system’s context. 
Understanding and managing systemwide 
change requires evaluations to be approached 
from a systems view through adhering to the 
system in its totality, the complexity of the 
reform, and the full integrity of the reform. 
Because reforming education systems requires 
time, evaluations of systemic reforms need to 
consider a time frame over some duration. 
Instrumental in projecting progress over time 
are “baseline indicators” to establish the state 
of a system at some point in time, signs of 
progress, and trajectories towards future 
outcomes. All evaluations of systemic reform 
operate in a context that this approach to 
change is only a theory. Information produced 
through evaluations then becomes a major 
source for challenging or affirming this theory 
for change. 

Zoe Barley, at the time a researcher and 
evaluator at Western Michigan University, 
emphasized one important reason for 
advancing systemic reform is the general 
failure of attempts to change specific system 
components in isolation such as teaching, 
instructional materials, and curriculum. She 
reported on the emerging understanding of 
theory-driven evaluation as an approach for 
shaping the evaluation of statewide systemic 
initiatives (SSIs) funded by the National 
Science Foundation. A first task in a theory- 
driven evaluation is to give form and 
specification to the theory. In the case of the 
SSIs this required documentation analyses and 
conversations with program directors. Two 
types of theories need to be explicated. A 
normative theory needs to be formed on the fit 
of the actual initiative activities with the 
intended intervention and the “design theory.” 
A causative theory needs to be formed on 
what are the initiatives’ impact. 

In the initial stages of a systemic reform 
evaluation, one approach to presenting the 
design theory is to develop a logic model-a 
conceptual representation of the relationships 



among the relevant inputs, intervening factors, 
intermediate benchmarks, and interventions 
with program staff. Through specific 
examples, Barley illustrated the usefulness of 
developing logic models to negotiate with 
project staff what an evaluation should 
emphasize and what are possible gaps 
between the planned work and the desired 
outcomes. She noted the need for rethinking 
the logic models based on untested theories. 

In one example, the role of the community in 
a large hierarchical urban district was, 
hypothesized as a necessary precursor to 
radical thinking of teaching and learning. This 
required the evaluators to rethink their logic 
model and the reallocation of evaluation 
resources. More effort was spent on observing 
the emerging relationship between key 
community leadership and the district 
administration. A theory-driven approach to 
systemic reform benefits both the evaluators 
in shaping their work and the reformers who 
gain a graphical representation of the 
relationship between their strategies, the 
multiple system factors, and the results they 
seek. 

Iris Weiss, president of Horizon Research, 
Inc., draws a parallel between evaluation of 
systemic reform and other reform efforts from 
her experience in evaluating a number of 
SSIs. Although evaluators can use time- 
honored evaluation strategies with reasonable 
confidence in studying the reforms of 
individual system components, evaluators of 
systemic reform have less assurance. The 
increased complexity of evaluating multi- 
facet systemic reform initiatives places 
evaluators in a position where proven 
techniques may not apply. They have to 
address and consider more components of the 
system; parts of the system not directly 
covered in the plan, sustainability of efforts, 
and expanded resources. They also have to 
seek a broader and deeper understanding of 
educational systems. In more traditional 
efforts of reform such as professional 
development of teachers, the goals and the 
interventions generally are clearly identified. 

In systemic reform, although the goals are 
understood, the bewildering array of options 
for intervention increases the likelihood that 
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an evaluator’s critique may only add to the 
confusion. 

Evaluators of systemic reform are 
confronted with targeting the available 
resources and setting priorities for what will 
be studied. This requires negotiation with 
project leadership who will expect more than 
the evaluation can deliver. Evaluators need to 
seek a balance in reporting findings and 
making recommendations for a major 
redesign. Recommendations can be 
circumvented if reported confidentially to 
only a few people, can be meaningless 
because the bulk of the system prevents any 
mid-course corrections, or can exceed the 



knowledge of what the existing leadership 
knows what to do. Evaluators have additional 
pressures exerted on them by funders who 
seek evidence of impact to report to policy 
makers long before the initiatives have had a 
reasonable time to surmount an effort 
necessary to make needed changes. Such 
pressures are increased by the lack of 
appropriate outcome measures of reform 
student achievement and system . attributes, 
such as alignment. All of these reasons, along 
with high visibility within a charged political 
arena, distinguish evaluation of systemic 
reform from what most evaluators of more 
traditional and restricted programs face. 
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THE DETROIT URBAN SYSTEMIC INITIATIVE: A PROMISING 
VIEW OF SYSTEMIC REFORM 



Juanita Clay Chambers 
Detroit Public Schools 



Statement of the Problem 

The United States is currently 
experiencing fundamental changes in every 
aspect of society. The popular notion is to 
change the structure of an organization if it is 
not functioning properly. New structures are 
constantly being created without adequate 
attention to institutionalization of behaviors, 
if the desired changes are to occur. During 
the past decade, teachers and groups from the 
private sectors have advocated for 
fundamental changes in the structure and 
outcomes of public education (Goodlad, 

1984). Frustration is high about the cost and 
quality of public education, with recent 
reports raising concern that our nation is still 
at risk, nearly a decade after the publication of 
“A Nation At Risk” pronounced the need for 
drastic changes in public education (Mumane 
& Raizen, 1988). 

As widely reported in the media, there is a 
crisis in education with long-term social, 
economic, and political consequences for the 

future of the nation. Parents, educators, 
business leaders, university representatives, 
students, and the community in general cry 
out for school reform. As the 21st century 
approaches, the demand for change and 
improvement is heightened if students are 
expected to cope and live successfully in an 
ever changing technologically advanced 
society. 

Across the nation, mathematics and 

science educators are engaged in large- 
scale reform efforts. A plethora of reports 
describe science achievement of the nation’s 

youth (Jacobson & Doran, 1991; Mullins & 
Jenkins, 1988; National Commission on 

Excellence in Education, 1983) and have 
served as the driving force for change. 
Implementing reform in science education 



requires teachers who are knowledgeable in 
science content, process and inquiry pedagogy 
(Radford, 1998). The challenges faced by 
urban districts are enormous and multifaceted. 
The National Science Foundation (NSF) has 
undertaken a national effort to . respond to the 
problem in science education by undertaking 
comprehensive reforms through states and 
large urban districts with a high poverty 
index. NSF’s strategy obligates states and 
large urban districts to mobilize broad-based 
coalitions to implement ambitious reform 
efforts in mathematics and science that are 
based on the premise that all children can 
learn if provided with a rich instructional 
environment. The second premise is that state 
and local policy changes can create these 
opportunities by providing a consistent and 
supportive policy structure for school 
improvement. 

The Detroit Public Schools is poised on 
the brink of a new era that signals both the 
promise and the challenge of a fundamental 
transformation of mathematics and science 
education. The promise lies in the significant 
efforts that are presently underway upon 
which the Detroit Urban Systemic Initiative 
can build. The challenge is reflective of the 
significant needs that are inherent in any large 
urban district (district size and complexity; 
limited resources; numerous ongoing 
programs and initiatives to align with the 
overall reform effort; a high proportion of 
students from economically challenged 
families; a large and diverse group of 
teachers; historically disparate resources by 
building; varying degrees of community 
support involvement and empowerment). 

These challenges have often been allowed to 
overshadow the reservoir of intelligence, 
academic potential, curiosity and enthusiasm 
with which our students enter kindergarten. 
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The progress of the Detroit Urban Systemic 
Initiative (DUSI) is important to measure over 
time. By using measures in an ongoing 
assessment and improvement process, the 
initiative is challenged to consider its 
strategies in terms of the relative efficiency 
and effectiveness. 

Process of Reform 

DUSI is the district’s main vehicle for 
achieving educational reform in mathematics 
and science. The initiative is linked to other 
essential components of the reform effort such 
as the Michigan Statewide Systemic Initiative 
(MSSI), the Detroit Mathematics and Science 
Centers, the Center for Learning 
Technologies, and the district Professional 
Development Council. As a result of the tight 
alignment of these components, a strong and 
holistic presence for mathematics and science 
education reform has been established in 
Detroit. 

From the beginning, DUSI determined 
that changes made as a result of its work 
would be system-wide and of major 
consequence in totally reforming the teaching 
and learning of mathematics and science. 
Understanding the enormous challenges of a 
large urban district, DUSI developed a tiered 
process for implementation that allowed the 
district to learn from and scale up to full 

Table 1 



implementation over the five-year USI grant 
period. The first tier of three constellations 
(33,195 students) began the process in 1994- 
95. Tier II followed the next school year, 
adding six constellations and three alternative 
schools (62,295 students). The third year, the 
final tier began the process with fourteen 
constellations and six alternative schools 
(72,5 110 students). Thus, the scaling up 
process (outlined in Table 1) was planned to 
engage all schools beginning in 1996-97 and 
to have full implementation in all schools 
beginning in 1998-99. 

Tier Structure for Scaling Up 

DUSI is organized to enact its theory of 
reform through simultaneous change in all 
major systems of organization and structure, 
classroom practice (including curriculum, 
instruction, and assessment), professional 
development, and community involvement. 
This theory of reform was introduced to 
teachers through “Articulation Sessions” 
which were initiated as a constellation entered 
the first year of DUSI. These sessions brought 
together mathematics and science teachers 
from all schools in a constellation and served 
to open communication lines, foster 
cooperation between schools, provide staff 
development, and initiate partnering activities 
for students and teachers. Although 
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constellations served as the format for 
dissemination of the vision, the unit of change 
has been identified as the building. It is within 
the building that reform must be 
operationalized. 

Using this communications vehicle, the 
new DPS standards-based curricula for 
mathematics and science were disseminated 
and further professional development and 
scale up for curriculum implementation was 
planned. At the elementary level, specialists 
in mathematics and science and building-level 
teacher leaders were trained in the new Core 
Curriculum to support other classroom 
teachers in mathematics and science 
instruction. At the secondary level, unit heads 
(middle school level) and department heads 
(high school level) were trained to assist 
mathematics and science teachers in 
constructivist teaching. 

District support staff also were developed 
to serve as resources in curriculum content as 
well as pedagogy. Simultaneously, the district 
began to develop means to recognize, support, 
and enable the involvement of parents and 
other community members in the educational 
process. 

Theoretical Framework 

The Detroit Urban systemic Initiative 
(DUSI) has been structured to connect with 
classroom teachers in direct and strategic 
ways. One of the first activities at the onset of 
the DUSI involved creating a document which 
articulated principles of teaching and learning 
that might ultimately improve student 
understanding and achievement. This 
document, A Constructivist Vision Towards 
Teaching, Learning, and Staff Development, 
has served to inform administrators, teachers 
and staff of the DUSI vision for improvement 
by outlining the concepts and practices in a 
new approach to mathematics and science 
education. A key challenge in large urban 
districts is to help all stakeholders understand 
and work toward common goals. This 
constructivist vision document has served as a 
template for professional developers and 
school teams as they plan for fiiture activities. 



In order for DUSI reform efforts to 
succeed, it has not only been important for 
teachers to understand DUSI goals, but also 
for teachers to articulate their own ideas about 
teaching and learning and to think about 
changes that are needed for success. Several 
researchers support the idea that teacher 
beliefs are precursors to change and that the 
teacher is the crucial change agent in paving 
the way to reform (Ajzen & Fishbein, 1980; 
Crawley & Koballa, 1992; Cuban, 1979, 
Fullan & Miles, 1992; Jenlink, 1995). 
Additionally, some researchers have noted 
that previous attempts at science reform fell 
short of successful change because they were 
not systemic in nature and often embodied a 
top-down model of change (Anderson & 
Mitchener, 1994; Bybee & DeBoer, 1994; 
Cuban, 1990; Fullan & Miles, 1992; Gordon, 
1993; Sashkin & Egermeirer, 1992); ' 

A study by Haney, Czeniak, and Lumpe 
(1996) further articulated the importance of 
teacher beliefs on changes in practice: 

In other words, teacher perceived 
outcomes regarding the behavior at hand 
and the likelihood that these outcomes 
will occur to be major influences on 
behavioral intention; therefore, 
contemporary reform cannot afford to 
ignore the importance of such beliefs. . . . 
The obstacles and enablers that the 
teachers were provided mattered less to 
them than did their beliefs about the 
positive and negative outcomes 
associated with the behavior. This 
finding suggests that teacher training 
should pay particular attention to the 
factors (such as providing curriculum 
materials, reducing class size, including 
flexible class scheduling, etc.) that are 
expected to lead to lasting changes in 
classroom practice, (p. 985) 

Although targeting teacher belief systems 
may be viewed as critical to change, there are 
many other obstacles that may impede 
progress. Sparks (1994) made 
recommendations for effective, sustained, 
high quality staff development. Among the 
recommendations that were interwoven into 
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the design and format of the professional 
development experiences were: 

Keep the focus on student learning. 
Recognize that change affects 
staff members in personal ways. 

Change the organization’s culture at the 
same time that individual teachers and 
administrators are acquiring new 
knowledge and skills. 

Use a systems approach to change. 

Apply what is known about the change 
processes to the improvement effort. 

Make certain that the learning process for 
teachers model the type of instruction that 
is desired. 

Provide generous amounts of time for 
collaborative work and various learning 
activities. 

Evaluation 

An evaluation process continues to 
examine the impact of the Detroit Urban 
Systemic Initiative on the improvement of 
mathematics and science in Detroit Public 
Schools. Emphasis is placed upon the 
implementation of standards that articulate 
what is important for students to know and do 
in mathematics and science; improved 
delivery systems, professional development, 
student enhancement activities, parental 
involvement and policy alignment. The major 
research questions for the evaluation follow: 

1. Is the initiative effective in significantly 
improving student achievement and 
accomplishments in mathematics and 
science? 

2. Is the initiative effective in producing a 
system supporting such improved 
student achievement and capable of 
sustaining this accomplishment over 
time. 

Data were gathered from two major 
stakeholder groups (teachers and students) 
over the years comparing the students and 
teachers before and after they experienced the 
program. 



Methodology 
The Target Population 

For two years, six schools were randomly 
selected for an in-depth study. This selection 
was based on the schools’ position in the 
staging process of the initiative. K-12 
systemic reform in Detroit will see students, 
K-12, and all teachers with responsibility for 
mathematics and science impacted by the 
change strategies as described. The goals of 
the initiative are: (1) to improve the 
mathematical and scientific literacy of all 
students; (2) to provide the mathematics and 
science fundamentals that will enable 
successful participation in a technological 
society; and, (3) to significantly increase the 
number of students that will enter 
mathematics, science and engineering careers. 

Detroit paced the implementation of the 
objectives of the US1 over the first three years 
by involving sets of the District’s 23 
constellations each year in stages. These K-12 
constellations will become learning 
communities in which staff come together 
periodically to plan, train, and make 
articulation decisions toward achieving the 
goals of the initiative. While all constellations 
were involved in the change process from the 
beginning, they were at different stages 
depending on their cunent status as it relates 
to staffing patterns. A profile of each K-12 
constellation was developed to determine its 
status in relation to the following three stages 
of involvement: (1) In-depth, successful, 
implementation of strategies and identified 
professional development; (2) preparedness 
for implementation; and, (3) awareness/ 
readiness/commitment. During the first year, 
three constellations were selected to start 
stage one, as their profiles revealed a 
significant number of activities in place for 
the purposes of the USI (these constellations 
became known as Tier I). In addition, during 
the first year, six other constellations began 
stage two, preparing for implementation (Tier 
II); and the remaining fourteen constellations 
began stage three developing awareness, 
readiness, and commitment (Tier III). For the 
second year of the initiative, the six 



constellations that started in Tier II in the first 
year, began stage one, in-depth 
implementation; and the remaining fourteen 
constellations began stage two preparing for 
implementation. All constellations were 
involved with the in-depth implementation by 
the third year of the initiative. By the fifth 
year, the results of systemic change will be 
evident. 

Of the six schools selected for this in- 
depth study, three schools are from Tier I, 
which was impacted more by the innovation 
and professional development offered by the 
initiative. The remaining three schools were 
selected from Tier II, which allowed for an 
investigation of the impact of the curriculum 
innovations, professional development, and 
curriculum implementation of the initiative. 
The second group of case study schools were 
selected because of exemplary performance 
on standardized measures. 

In addition to the in-depth case studies, 
surveys were conducted in 54 schools 
randomly selected by tier. The sampling plan 
was a non-proportional sample to represent 
tiers and school levels. The sampling plan 
provides for random selection and reasonably 
sized samples. A two-stage sampling process 
was used for teachers with random selection 
of schools by tier and subsequent surveying of 
all mathematics and science teachers within 
the randomly selected school. Adequate 
representation by Tier (I, II, and III) was 
provided for. The sampling is 
disproportionate and appropriate weighting 
was conducted. All mathematics and science 
teachers in the randomly selected schools 
were surveyed along with their students. 

Focus groups were conducted among Tier I 
and II teachers, parents, Unit 
Heads/Department Heads. 

The study follows an Institutional Cycle 
Design (Payne, 1994), where a group is first 
assigned to a treatment (Tier I) and then is 
tested. The second group (Tier II) would be 
tested at the same time as the first group and 
then exposed to the treatment. They would 
then be post-tested. Then a third group (Tier 
III) would be tested at the same time as the 
group two post-test and receive the treatment. 
They would be post-tested after receiving the 



treatment. Program impact will be measured 
by Tier I post- versus Tier II pre-; Tier II post- 
versus Tier III pre-; and Tier HI pre- versus 
Tier III post-tests. Data from the surveys were 
analyzed over a three-year period. 

Summary of Progress 

Some of the most important findings are: 

• Increases were observed in the 

number of teachers engaged in 
mathematics and science professional 
development (PD) through the 
Mathematics and Science Centers and 
DUSI. In addition to formal PD, a 
comprehensive structure supporting 
change in teaching practices, which is 
not captured in the data, occurs on a 
daily basis in schools and classrooms. 
These offerings are tightly aligned 
with the desired changes in 

curriculum and pedagogy so that 
support for systemic reform is built 
into the process for continually 

upgrading teachers’ skills. 

• The Tier system for implementation 

has demonstrated DUSI’s impact on 
both student and teacher measures. 
Tier I teachers reported increased use 
of standards-based instructional 

practices, more involvement of 

parents and community members in 
the mathematics and science 
programs, and greater teacher 
confidence in their ability to 
implement the standards-based 

instruction as a result of their 
involvement in PD. Students in Tier I 
schools reported more positive 
attitudes toward mathematics and 
science instruction and confirmed the 
increased use by teachers of 
standards-based instructional 
strategies. 

• Teachers report good alignment of 

curriculum with their instructional 
practices, including emphasis on 
developing students’ problem-solving 
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skills, abilities to make connections to 
the real world, skills and knowledge 
for excelling on local and national 
tests of mathematics and science 
achievement. 

• The DUSI Summer Institute was a 
very successful staff development 
activity that resulted in improved 
teaching strategies, increased use of 
new mathematics and science 
curricula in the District, and building- 
level action plans. As a result of the 
PD program, teachers reported a high 
degree of confidence in their ability to 
implement the new standards-based 
curriculum and related teaching 
practices. 

• An increasing percentage of students 
are becoming engaged in mathematics 
and science. Of note are the large 
increases in student enrollment in 
advanced courses. Also, while student 
programming has not changed 
greatly, many more students are 
involved in the program across all 
levels. 

• Student performance results indicate a 
steadily-improving rate of 
achievement as measured by the state- 
required criterion-referenced test (the 
MEAP) and the district-required 
norm-referenced test (the 
Metropolitan Achievement Test) in 
both mathematics and science at all 
grade levels. 
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Evaluation of Systemic Reform in 
Mathematics and Science 




PROCESS OF REFORM 


National Institute for Science Education 

Detroit Urban Systemic Initiative 

Eddie L. Green, Ed.D. 

General Supenmendcr* * A Principal Invouguor 
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DUSI: 

The Unitary vehicle for achieving 
educational reform in mathematics 
and science in the Detroit Public 
School District. 
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DUSI REFORM PROCESS 
ADDRESSES MAJOR VARIABLES 

• District size and complexity 

• Numerous ongoing programs and initiatives to align 
with the overall reform effort 

• A high proportion of students from economically 
challenged families 

• Historically disparate building resources 

• A large and diverse group of teachers 

• Varying degrees of community support, involvement, 
and empowerment 
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VALUES WHICH DEFINE THE 
CULTURE OF 

SUCCESSFUL URBAN SCHOOLS 

• A pervasive belief that All students can learn and 
achieve at high levels 

• Acceptance of the premise that schools must be a 
learner-centered, caring community 

• A primary and central focus of professional staff 
on student Outcomes 

• A consensus that “Everyone** is responsible for 
learning 
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FOUR CORNERSTONE INITIATIVES 




FOUR CORNERSTONE INITIATIVES 

(Continued) 


• Exit Skills : Grade level Performance Standards to 
assure that individual students meet specific 
academic targets 

* Resource Coordinating teams Professionals 
(consisting of teachers, nurses, social workers, 
counselors, psychologists, attendance officer and 
other supporting agencies.) configured to address the 
barriers to learning 




* Krl2 Constellations : 20 learner centered 
communities with neighborhood resources focused 
on student development and progress which create 
learning “villages** for learning and efficiency 

* Site Based Management A local governance 
involving school-community stakeholders which 
allows for decision making that is specific for a 
particular school. This involves the creation of a 
local council of administrators, teachers, parents 
and others concerned about student progress 
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DUSI KEYS TO SUCCESS 

• A commitment to student outcomes and 

metrics to gauge project progress 

• Early strategic planning and capacity building 

• Establishing and gaining commitment to a clear 

vision 

•The development of a visionary guide entitled: 

“A constructivist Vision for Teaching , 
Learning ; and Staff Development” 
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POLICY CHANGES 

• Curriculum alignment with national and state standards 

• Alignment of assessment with instructional practice 

• Increased opportunities for content-specific professional 
development 

• Initiated “new” delivery systems 

• Encouraged greater parent and community involvement 

• K- 1 2 Mathematics and Science Resolution 

• Increased graduation requirements in mathematics and 
science 
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MANAGEMENT CHANGES 




ARTICULATION OF THE VISION 


• Realigned Organizational Structure 




• The vision becomes real in the 
classroom, in the school and 






constellation and across the system. 


• Project Director elevated to cabinet 
level status 




• At each of these levels, an 
expression of the vision is 






supported by programming efforts. 
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CRITICAL ISSUES RELATED TO 
DUSI REFORM 




DATA COLLECTION AND 
UTILIZATION 


• Creation of a "Learner-Centered” school 




• Monthly Department/Unit Head updates 


Service Support Intervention Program 




• Data Collection Task Force established 


which promotes Academic Excellence 
and Hi?h Achievement 




• Documentation notebooks 

• Ninth Grade Restructuring Evaluation 


• Systemic approaches to reduce/eliminate 




• MEAP/MAT Data for decision making 


the "Barriers to Learning” 




• Professional Development Database 






• Case studies & Services Rendered forms 
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IMPACT OF DUSI 



• Continuous improvement of test scores on 

standardized measures 

• Increased student participation in Mathematics and 

Science Opportunities 

• Improved delivery of standards -based instruction 

as a result of exciting professional development 
opportunities 
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TEACHER IMPACT 

• 100% of the teachers of the Detroit Public 
Schools impacted by vision 

• 75% of the teachers of the Detroit Public 
Schools receiving direct services 

• 85% of the teachers impacted by the DUSI 
Summer Institute for Professional 
Development 
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INNOVATIONS IN - 
PROFESSIONAL DEVELOPMENT 

• Release time for planing 

• Constructivist Workshops 

• Learning Logs 

• Connected Math Inservice 

• Algebra and Geometry Courses 

• Technology Strand Development 

• Ninth Grade Restructuring 

• Building PD Planning 

• Modeling 

22 



DUS1 1997-98 

Professional Development Participants 



□ BomtnUry Teachers 

■ MtUiSchooi 
MrihTeacher* 

□ School Uuh 

Teacher* 

□ Kfcfcfie School Scene* 

Teacher* 

■ School Sconce 
Teacher* 
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DUSI INNOVATIONS AND 
MAJOR DELIVERY SYSTEMS 


• TERC Investigations 


• Project AIMS 


• Connected 


• Insights 


Mathematics 


• Project SEED 


• Core Plus 


• VideoDiscovery 


Mathematics 


Science Sleuth* Video laser Discs 


• 24 Challenge 


• VideoDiscovery 


Mathematics 


Bioforums Videolascr Discs 


• FOSS 
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DUSI INNOVATIONS AND 
MAJOR DELIVERY SYSTEMS 

• Model-It • Family Math and Science 

• Technology {Pasco Probes) • Study Groups 

• Project-Based Science • Center for Learning 

• Constructivist Teaching Technologies 

and Learning Practices • University CoUTSewOrk 

• DUSI Summer Institutes • Integrated Natural Science 

(District Developed Course) 
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DUSI INNOVATIONS AND 
MAJOR DELIVERY SYSTEMS 

• Biology. Chemistry, • Center for Molecular 

Physics Support Series and Cellular Toxicology 

• Apprenticeship Programs ,Wiync State Umvenny) 

Henry Ford Hospital * Science in the City 

Karminos Cancer Center (Michigan Sate University) . 

Wa>ne Sale L’ni\crsiry 

• Science Fair Support 

• Problem Based Instruction 
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Detroit, Michigan 48202 


Phone: 
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Fax: 


313-494-7864 


E-mail: 
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UNDERSTANDING EVALUATION OF SYSTEMIC REFORM: 

PURPOSES AND VISION 



Daniel J. Heck 

University of Illinois at Urbana-Champaign 



Evaluation and Systemic Reform 

Systemic reform is a theory of change 
intended to move an education system toward 
an ambitious vision of learning. Evaluation 
should inform and enhance its potential to do 
so (Rowland, 1994). It is troubling, then, that 
Carol Weiss (1991, pp. 223-224) described 
program evaluation as a political act that 
“tends to ignore the social and institutional 
structures within which the problems of the 
target group are generated and sustained.” 
Weiss added that “most of the political 
implications of evaluation have an 
establishment orientation. They accept-and 
bolster-the status quo. They take for granted 
the world as defined in the existing social 
structure.” Any evaluation that ignores the 
influence of existing educational structures 
and fails to look beyond traditionally accepted 
solutions will hardly be well-matched to 
systemic reform. Evaluations of systemic 
reform cannot behave as Weiss suggests many 
evaluations do. Systemic reform demands 
approaches to evaluation that match the 
considerable extent of the reform; the 
evolving goals of reform; the interdependent, 
emergent and responsive events and 
understandings of the reform; and the shifting 
political influences surrounding the reform 
(Bruckerhoff, 1997; Jenness & Barley, 1995; 
NSF, 1993; Ridgway, 1998). 

Purposes for Evaluation of Systemic 
Reform 

For what purposes do we evaluate 
systemic reform? Adaptation and innovation 
of activities and structures aligned toward a 
common, ambitious vision of learning 
characterize systemic reform in education. 



Cronbach and colleagues (1980, p. 156-157) 
wrote that “evaluation at its best assists in a 
smooth accommodation of social activities 
and structures to changing conditions and 
ideals.” In order for evaluations to “assist in a 
smooth accommodation” of systemic reform 
in an education system, evaluations must 
make sense of change on a large scale, over a 
long time, and within an evolving theory of 
reform. Michael Patton has offered one 
approach toward such an evaluation. Patton 
(1994) coined the term “developmental 
evaluation” to describe evaluation activity that 
focuses on continual innovation and 
adaptation such that versatile initiatives 
remain best suited to changing conditions and 
contexts. Development-that is, learning, 
innovation, and change-should be at the 
heart of systemic reform and its evaluation. 

Education systems engaged in systemic 
reform of mathematics and science face three 
key developmental issues: (1) the need to 
understand and manage change throughout a 
large, complex system, (2) the need to track 
the nature and extent of change over time, and 
(3) the need to build and test a theory of 
systemic reform operating in the system’s 
context. These three major issues render three 
meaningful purposes for evaluation of 
systemic reform. Each can be met well 
through a developmental evaluation focused 
on learning, innovation, and change. The first 
two of these purposes will be discussed here. 

Understanding and Managing Systemwide 
Change 6 

A key concept in systemic reform is 
systemwide change. The term systemwide is 
commonly used to describe the collection of 
districts and schools comprising an education 
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system. In systemic reform, systemwide also 
refers to the major Junctions of the education 
system-policy and governance, management 
and administration, instruction and learning. 

In the context of systemic reform, learning, 
innovation, and change develop 
interdependent^ across districts and schools 
and throughout the junctions of the system. 
Evaluation of systemic reform should promote 
understanding and management of change 
developing across and throughout a system. A 
proposed developmental evaluation 
framework for understanding and managing 
systemwide change involves three closely 
related, but distinct, perspectives-a whole 
system view, a holistic view, and a systems 
view. 

First, through a whole system view 
evaluation should provide sound and thorough 
descriptions of the education system, the 
reform, and their intersections. In order to 
facilitate understanding and management of 
systemwide change, evaluations should 
identify the districts, schools, and other 
structures along with the functions that 
comprise the education system. Evaluations 
should also identify the components of the 
reform effort, and most importantly, the 
specific parts of the system being targeted by 
components of the reform. The evaluation 
should highlight intended and unintended 
points of pressure and influence between the 
reform and the system. Evaluation audiences, 
who must understand or manage systemwide 
change, should maintain this whole system 
focus so that the big picture of reform is not 
lost in the details of planning or 
implementation (Bruckerhoff, 1997; Heck & 
Webb, 1996). The whole system view 
represents the totality of the systemic reform. 

, Second, evaluation should embed the 
multiple and dynamic objectives and activities 
of the reform within a holistic view of the 
reform (LeMahieu, 1997). In order to 
understand or manage systemwide change, 
evaluation audiences must appreciate how 
each activity or objective of the reform relates 
to the primary goals of the reform and overall 
vision of systemwide change. Moreover, 
when the evaluation focuses on specific 
districts or schools, or individual functions of 
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the system, the nested activities and objectives 
pursued at each point within the system 
should make sense within the global vision of 
the systemic reform. The holistic view 
represents the complexity of the systemic 
reform. 

• Third, evaluation of systemic reform 
should offer a systems view. Principally, a 
systems view requires that evaluation examine 
the totality and complexity of the reform 
operating as a united force within the system 
(Banathy, 1995). A systems view suggests 
that evaluations should ultimately focus not 
on the degree of success of isolated 
components of the reform, but rather on the 
degree of success of the whole reform within 
the whole system through attention to a 
shifting balance of linkages, 
interdependencies, and processes that amplify 
desired outcomes (Webb, 1997). Furthermore, 
evaluation that adopts a systems view will not 
attend to discrete causes and effects, but 
rather to evidence that the reform effort as a 
whole contributes to successful solutions to 
entrenched problems throughout the system 
(Julian, Jones, & Deyo, 1995). An evaluation 
with a systems view will provide information 
and judgments regarding priorities of the 
reform, sequencing of activities, development 
and evenness of quality learning throughout 
the system, connections needed to achieve 
success, and barriers that might preclude 
success (CPRE, 1995; Julian, Jones, & Deyo, 
1995). The systems view represents the 
integrity of the systemic reform. 

Finally, evaluation of systemic reform 
should serve the understanding and 
management of systemwide change through 
the value that it adds to development of the 
reform. Evaluation might provide warnings 
about potential challenges and failures; 
identification of opportunities to link with 
other efforts; and indications of schools, 
districts, or components of the reform that 
should be given extra attention at various 
times (O’Day & Smith, 1993; Ridgway, 

1998). The processes of evaluation ought to 
promote learning throughout the system 
through self-reflection against criteria that 
stakeholders trust (Corcoran, 1997; Goertz, 
Floden, & ODay, 1996). 

35 



Tracking Change Over Time 

Systemic reform is an endeavor that is 
expected to last many years. Evaluations of 
systemic reform must take this temporal 
dimension into account by providing evidence 
of and supporting judgments about the value 
of systemic reform over time (Clune, 1993). 
Although many researchers and evaluators 
agree that the full impact of systemic reform 
on any school, district, or state cannot be 
assessed for many years, evaluation can make 
critical contributions to tracking change over 
time from the outset of a systemic reform 
effort (CPRE, 1995; Goertz, Floden, & 

O’Day, 1996). 

First, evaluations should include needs 
assessments for the system as a whole and for 
different parts and functions of the systems; 
identification of important baseline indicators 
should follow from needs assessments (I. 
Weiss, 1997). From needs assessments and 
baseline indicators, evaluators will gain 
meaningful ideas regarding “what to look at” 
in order to track meaningful change. 

Second, given the systemic reform’s 
strategies and time lines, evaluations can 
make conjectures regarding when, where, and 
to what extent certain changes in the system’s 
infrastructure or its outcomes might be 
expected (Ridgway, 1998). From these 
conjectures, evaluators should derive insight 
regarding “where and when to look” for 
meaningful changes. 

Third, continuous evaluative feedback on 
implementation and early change will provide 
important information about opportunities and 
challenges that managers may use to reorient 
and reposition the strategic thrusts of the 
reform. Evaluations should continually 
determine “what lessons have been learned” 
that can aid development. Past and current 
evaluations have revealed that designs of 
systemic reform evolve considerably over 
time. Evaluations can make a significant 
contribution to the ongoing design of systemic 
reform efforts if they alert key audiences to 
the greatest needs and the most potentially 
beneficial opportunities at various times in the 
life of a systemic reform effort (Heck & 

Webb, 1996; Ridgway, 1998). 



Evaluators who examine change over 
time should never lose sight of two vital 
principles underlying systemic reform. First, 
change itself is not the intent of systemic 
reform; valuable development toward 
ambitious learning goals is the aim. To 
provide evidence regarding how a system has 
developed with respect to desired effects and 
impacts will be far more powerful and useful 
than merely demonstrating that the system has 
changed (Heck & Webb, 1996; Rowland, 

1994). Second, in evaluation of systemic 
reform assigning blame or praise for past 
action should remain distantly subordinate to 
reflection, learning, and guidance for future 
development. Systemic reform is about 
growth toward a vision of the future and 
evaluation primarily ought to track change 
and inform development toward that vision 
(Banathy, 1995; Rowland, 1994). Both 
evaluators and reformers need to have a view 
of the past, the present and the future in order 
to understand, first, how the present state of 
the system represents learning from the past; 
second, how the present state relates to the 
idealized future vision of learning; and third, 
how the systemic reform might make the 
future vision possible. 

Summary 

Evaluation of systemic reform can serve a 
number of important purposes; most 
evaluations will address several purposes at 
once. Three critical purposes that evaluation, 
particularly developmental evaluation, can 
serve well have been introduced, and two 
have been highlighted. First, evaluation of 
systemic reform should aid stakeholders’ 
understanding and management of 
systemwide change. Systemic reform 
generally involves multiple, interacting 
components targeting different functions in 
numerous local sites. Evaluation can be a 
valuable tool for managing the developmental 
challenges of such complex efforts if it 
consistently represents the totality, 
complexity, and integrity of systemic reform. 
Second, evaluation of systemic reform should 
track change over time. The long-term, 
progressive nature of systemic reform 
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demands that reformers and stakeholders 
understand how the reform develops toward 
the vision that guides it. Well-designed 
evaluation can trace the development and 
adaptation of the systemic reform to changing 
conditions and ideals. Third, although not 
directly addressed in this paper, evaluation 
can help build and test the theory of systemic 
reform. The theory of systemic reform is the 
means by which to describe, interpret, and 
learn from enactments of systemic reform. 
Evaluation can be a vehicle facilitating 
healthy development and interplay between 
theory and action. 
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OVERHEADS USED 



Purposes and Vision for 
Evaluation of Systemic Reform 




Evaluation and Systemic Reform 






Evaluation “tends to ignore the social and 
institutional structures within which the 










problems ... are generated and sustained/* * 




Daniel J. Heck 






“Most of the political implications of 
evaluation have an establishment 




University of Illinois at Urb ana-Champaign 
NISE Forum. 1999 






orientation . They accept-and boister-the 
status quo/* (Weiss, 1991) 
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Developmental Evaluation 



Evaluation processes and activities that 
support program ... development . The 
evaluator is part of a team whose members 
collaborate to conceptualize , design , and 
test new approaches in the long-term , 
ongoing process of continuous 
improvement, adaptation , and i#tle ntional 
change " (Patton, 1994) 



Purposes for Evaluation of 

Systemic Reform 

\ 

• To aid understanding and management of 
change throughout the system 

• To track the nature and extent of change 
over time 

• To build and test a theory of systemic 
reform 



Understanding and Managing 
Systemwide Change 

Perspectives 

• Whole Svstem View-System, Reform, and 

their Intersections 

• Holistic View-How each activity or 

objective relates to primary 
coals and overall vision 

• Systems View-Totality and complexity of 

reform as a united force 



Understanding and Managing 
Systemwide Change 



Quantitative 




Qualitative 


Techniques 




Techniques 


• Hierarchical Linear 




• Nested Case Studies 


Modeling 




• Case-Ordered 


• Structural Equation 




Displays and 


Modeling 




Analyses 
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Tracking Change Over Time 

Perspectives 

• Baseline indicators-What should we look at? 

• Conjectures. Projections-When and where do 

we look? 

• Feedback on Implementation-How do things 

look now? 

• Ongoing Design-What we want things to look 

like in the future? 



la 



Tracking Change Over Time 



Quantitative 




Qualitative 


Techniques 




Techniques 


• Time Series Analyses 




• Longitudinal Case 


• Repeated Measures 




Studies 


Designs 




• Time-Ordered 


• Hierarchical Linear 




Displays and 


Modeling 




Analyses 



Summary 



• Developmental evaluation is an approach 
that is well-matched to important purposes 
for evaluation of systemic reform 

• Evaluauon should aid in management, 
leadership, and understanding of systemic 
reform 
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TRACKING THE THEORY OF CHANGE: A MOVING TARGET 
Evaluation of Systemic Reform in Mathematics and Science 

Zoe A. Barley 

Western Michigan University 



Background 

Education’s move toward systemic reform 
arose from prior failed efforts to improve 
educational outcomes for students by focusing 
reform separately on changing teaching, 
instructional materials, or curricula. While 
each of these might have changed initially, the 
larger school context eventually defeated 
realization of the desired improvement in 
student outcomes. In some cases, change 
itself became impossible given the barriers, 
e.g. policies, procedures, or resource scarcity 
presented by the school context. In other 
cases gains in one area were offset by losses 
in another. Educators and researchers came to 
see these barriers, not as isolated issues, but as 
part of a system of education. All of the 
pieces -including roles and relationships of 
the persons involved, policies and procedures, 
resources and capacities-needed to be 
understood as they interrelated to support or 
defeat reform. Comprehensive systemic 
reform underscores the necessity for 
reformers to consider all aspects and 
influences that finally determine how students 
develop the knowledges and skills desired. 

This extends the reform to include parents and 
community and to other levels of influence 
such as higher education, state educational 
policy makers and federal policies and 
programs. Comprehensive systemic reform is 
now understood to be the essential strategy for 
educational reform. 

The National Science Foundation (NSF) 
was an early and strong supporter of systemic 
change initiatives. Through its Systemic 
Initiative (SI) programs, states (SSIs), rural 
(RSIs), and urban (USIs) areas were frinded to 
conduct systemic reform with an emphasis on 
improvement in mathematics and science. 

Each of these initiatives was required to have 
an external evaluator and for the statewide 
initiatives, SRI International was contracted to 



conduct a national evaluation. Many of the 
external evaluators attended biannual 

conferences held to support networking 
among Principle Investigators and Project 
Directors of the state initiatives. These 
furthered the dialog among evaluators about 
issues in the evaluation of systemic reform. 
Eventually evaluation issues led to an SSI 
evaluation workshop (February 1994) on 
using logic models (Rog, 1994) to link 
evaluation and key SSI strategies. 

Logic models are an off-shoot of Chen’s 
( 1990) work on theory-driven evaluation. 
Theory-driven evaluations seek to elucidate 
the program theory, “the set of interrelated 
assumptions, principles, and/or propositions 
to explain or guide social actions” (Chen, 

1990, p. 40) that the program designers have 
in mind, consciously or unconsciously, as 
they develop and implement the program 
being evaluated. Program theory became 
more complex with the move to systemic 
reform, encompassing the entire system 
relevant to the desired outcomes including 
context, presumed causal factors, mediating 
factors, and the interventions or program 
activities themselves. Schon (1997), in his 
work on program theory, noted the 
importance of paying attention not only to the 
espoused theory, the originally developed 
understanding, but also to the theory of action 
(or theory-in-use), which emerges as the 
program or reform is implemented.’ 

Some SSIs developed logic models for 
their state systemic reforms as a result of the 
workshop and the evaluators to a greater or 
lesser degree used these in shaping the SSI 
evaluations. This paper reports on one 
evaluation team’s experience with the use of 
theory-driven evaluations for several systemic 



1 Schon distinguishes a third theory, the “design 
theory,” which emerges as the espoused theory 
gets concretized in budgets and planned actions. 
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reform initiatives including two SSIs, an 
OERI funded advanced technology grant, and 
a privately funded urban systemic reform. 

Critical Factors in the Efficacy of Systemic 
Reform Initiatives 

The first task of a theory-driven 
evaluation is to give form and specification to 
the theory, drawing upon program 
documentation and directed conversations 
with the program directors. A theory-driven 
evaluator needs to have a good understanding 
of best thinking in the program area in order 
to describe the program theory and to identify 
gaps in the logic or misperceptions in what 
will accomplish the outcomes. For systemic 
reforms, given their complexity, this need is 
even more important. The NSF Office of 
Systemic Reform program staff developed 
definitions for a set of eight elements of 
systemic reform and six “drivers” that NSF 
perceived to be essential in moving reform 
forward. Other documents that emerged to 
define systemic reform for SSIs included “A 
Continuum of Systemic Reform” developed 
by Beverly Anderson (Education Commission 
of the States, 1992) and SRI’s concept model 
for the SSI national evaluation (SRI, 1992). 
These additional documents, as well as 
emerging research about successful systemic 
reform initiatives were useful in identifying 
gaps or misperceptions in the program theory 
of the reform. 

Chen (1990) makes the distinction 
between “normative theory,” what the 
structure of the program should be 
(prescriptive), and “causative theory,” what 
the underlying causal mechanisms actually are 
(descriptive). The normative theory is what 
the evaluator finds in examining the program 
documents and talking with program 
directors. It usually has come from 
unexamined premises or prior experience. The 
causative theory is empirically based and 
comes from the relevant literature. The 
evaluator explicates both theories in order to 
design a theory-driven evaluation. The 
normative theory assesses the consistency of 
the actual program activities in relation to the 
intended intervention and shapes the 



evaluation of the implementation. The 
causative theory assesses both the impact of 
the program and how the impact was 
generated and shapes the summative 
evaluation. 

Three distinctive features of systemic 
reform influence the design of the evaluation. 
Inevitably inherent within the understanding 
of the program theory is a set of values. 
Minimally, the intended outcomes are valued 
for those who will participate in the program. 
Systemic reform also includes value at a 
systems level, the value of continuous 
improvement, a self-renewing process in 
which the system makes corrections in its 
strategies and processes to enhance the 
attainment of desired outcomes. Such a 
process entails the collection, analysis, and 
interpretation of data as a part of system 
functioning. While the external evaluation 
could operate entirely independent from the 
internal evaluative process, more typically the 
evaluators have operated in a supportive role, 
providing technical assistance for the internal 
evaluation process while gaining data useful 
for the external evaluation. Close 
collaboration of the two efforts reduces the 
data burden on the elements of the system and 
maximizes the information available to 
program directors and Hinders. This first 
feature, the collaborative role, is then 
reflected in the evaluation design. A second 
typical characteristic of systemic reforms is 
the press for stakeholder involvement at all 
stages. Stakeholder involvement in the 
external evaluation is best served if a 
representative group of stakeholders is 
involved beginning at the design stages. 
Finally, systemic reform is a long-range 
process. Evaluators must identify or develop 
intermediate benchmarks as a means to assess 
whether the reform is progressing prior to 
expected changes in ultimate outcomes. This 
third feature thus adds another dimension in 
evaluation design. 

Developing a Theory-Driven Systemic 
Reform Evaluation 

As the evaluator works to make explicit 
the theory or logic of the systemic reform, 
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Schon (1997) suggests a series of pertinent 
questions: Is the design theory congruent with 
the espoused theory? (Do we design what we 
espouse?); Is the theory-in-use congruent with 
the design theory? (Do we enact what we 
have designed?); Are the theories internally 
consistent?; and, Is a given theory of action 
effective in the sense that its strategy yields 
the desired outcomes? One way to present the 
design theory is what is known as a logic 
model, a conceptual representation of the 
relationships among the relevant inputs, 
intervening factors, intermediate benchmarks, 
and interventions leading to the outcomes. 

The logic model is developed based on the 
data at hand. In the early stages, this data 
would come from planning documents, early 
implementation pieces and interviews with 
program staff. Presuming the evaluator is on 
board as the detailed planning is conducted, 
the logic model can serve not only to assist 
the program planners with identifying 
program design problems but for the evaluator 
it may serve as an evaluability assessment 
(Wholey, 1979), an analysis of whether the 
reform is sound enough in design to warrant 
an evaluation. The next steps for the evaluator 
are to enter into a dialogue with program staff 
about the logic-or absence of logic-in the 
planned work, to develop the collaborative 
relationship with internal evaluators, and to 
engage other stakeholders in final aspects of 
the evaluation design. 

In one systemic reform, we found that the 
program staff, who were dedicated program 
activists, lacked interest in the logic model 
approach and could not grasp the importance 
of assessing whether the detailed plans would 
actually realize the outcomes their espoused 
theory promised. For the first few years of 
this initiative, the implemented program was 
actually a multiplicity of separate programs 
with little or no “systemicness” about the 
work. The desired outcome of a reformed 
system did not occur despite the realization of 
many useful outcomes for individual teachers. 
In another systemic reform initiative, the 
program staff worked hard at developing the 
logic model. In the process they realized that a 
number of related agencies would need to be 
brought on board if the reform were to 



succeed. Building this broader base of 
involvement became a key strategy and the 
interconnections an important intermediate 
outcome. 

In both cases cited above, the evaluation 
was designed using a combination of the 
theory of action of the program implementers 
and the causative theory-the best thinking 
available about systemic reform initiatives. 

The logic model, even when not seen as 
useful, served as a means to negotiate 
evaluation emphases for the evaluation 
design. For the first case while system change 
was tracked, the emphasis was on the 
particular results of various program efforts. 
For the second case, much more emphasis 
was placed on changes in the system — 
policies, procedures, extant programs-and in 
the connections among the key system 
elements. Interestingly, funders were more 
interested in specific program outcomes than 
in system changes despite espousing systemic 
reform. 

Implementing a Theory-Driven Evaluation 

Because the development of logic models 
came after the initiation of the evaluations for 
the SSIs, the models represented the theory 
after it had been significantly modified as a 
result of the management team better 
understanding what systemic reform entailed. 
In another case, the evaluation of an advanced 
technology grant, we were able to initiate the 
theory-driven approach including a logic 
model as the program began. The program 
plan entailed installing equipment centrally 
and in classrooms and setting up electronic 
connectivity for all teachers across 5 districts 
in a single county. The desired outcomes 
included student achievement improvement, 
teacher retention, and teacher and student use 
of electronic support for teaching and 
learning. The logic model revealed gaps 
between the planned work and the desired 
outcomes. When this, was revealed the 
consortium sought GOALS 2000 money to 
institute professional development for the 
teachers in the use of technology and to 
establish a common curriculum to foster 
connections among teachers. While gaps 
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remained in the “logic” of the work, the 
additions suggested the program warranted 
evaluation (evaluability assessment) and the 
relationship forged with the program 
implementers created a continuous learning 
mode allowing data collected by the 
evaluation to continue to inform program 
decision-making. 

For the privately funded urban reform, a 
logic model was developed as part of the 
response to the RFP for the evaluation. 

During the six-month evaluation-planning 
period the model was further refined with 
input from the community-based Evaluation 
Committee. For the reform to be successful, a 
delicate balance among business, strong 
locally focussed foundations, community 
activists, and the school district had to be 
worked through. The presenting issues were 
always couched in who had authority or 
power and what access non-district groups 
and persons would have to district decision 
making. Yet the goals of success for all 
students were commonly held by all. The 
original model had to be rethought to better 
represent untested theories about a community 
role in a large hierarchical urban district as a 
necessary precursor to radical rethinking of 
teaching and learning. As evaluators 
rethought the logic model, a reallocation of 
evaluation resources was necessary. More 
effort was expended than first planned in 
monitoring district policies and personnel 
changes and in observing the emerging 
relationship between key community 
leadership and the central administration of 
the district. The establishment and 
institutionalization of a set of common 
success indicators to be measured by the 
district but developed collaboratively and 
monitored by the evaluators was one 
important coming together. Evaluation 
resources were also redirected to support this 
collaborative work, a fairly labor intensive 
effort in engaging sometimes hostile 
community members with isolated and 
defensive district assessment personnel to 
come to consensus on the definition and 
measurement of the indicators. In this case, 
ongoing reflection on the theory of action held 
by the various parties enabled the evaluation 



team to better focus its work and interpret the 
data collected. 

Some Concluding Thoughts 

Thus, a theory-driven approach to 
systemic reform evaluation may ultimately be 
of greatest use to evaluators in shaping 
reflection and focusing their work. It does, 
however, especially in conjunction with a 
logic model, offer a better way for reformers 
to graphically understand not only the results 
they seek in relation to the strategies they 
undertake, but also the multiplicity of factors 
in the larger system in which they operate. 

As a part of the evaluation design, data 
collected on intermediate benchmarks provide 
early checks on whether the reform is on 
track. A collaborative relationship with 
system reformers influences the role 
evaluators play, for example adding technical 
assistance and consensus building, but it also 
can result in better and more efficient data 
collection as responsibility is shared and 
results mutually beneficial. Evaluators also 
gaip tools for their own reflection from a 
theory-driven approach including an early 
consideration of whether a reform warrants an 
evaluation effort. 
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OVERHEADS USED 



Requirements for the 2 1st Century 



• An interdisciplinary environment that challenges the way we organize 
classrooms, subjects and knowledge in schools and colleges 



• A curriculum that stresses lifelong skills such as learning how to learn 
instead of rote teaming 



• Teachers who take on different roles * not only lecturing, but also 

coaching, role playing, facilitating 

• An environment that encourages students to take more responsibility 

for their learning, becoming active participants instead of passive 
recipients of information 

• Improved forms of assessment, including portfolios, exhibitions and 

demonstrations 



• Creative use of time and space 



• More individualized instruction, i.e., methods that respond to students’ 
, individual learning styles 



• More diverse ways of organizing and presenting information 



• More decision-making autonomy given to those persons closest to the 
problems 

From Introduction to Systemic Education Reform • EDC 



45 



39 



REFLECTIONS ON SYSTEMIC REFORM EVALUATION: 

NISE CONFERENCE 

Issues in Evaluating Systemic Initiatives: 

If it is intended that the evaluation itself model/embody a systemic 
approach, the following ensue: 

. the list of constraints is not just doubled but squared - given the 
confounding effects. 

. the complexity of the evaluation constantly challenges the ability to 
focus 

. expectations of stakeholders, about the criteria for evaluating, the role 
of evaluators, their roles given shared decision-making, etc are difficult 
to sort out and meet 

. the political aspects of the evaluation are not only the context but also 
influence the release and use of findings 

. new roles are required of evaluators who already lack experience in the 
task at hand 



Problems in Evaluating Systemic Initiatives: 

. getting program implementers who are activists to use evaluation 
findings in making decisions about revising program directions 

• teaching/instilling a mindset and methods for using evaluation data 
from a non-teaching platform 

• operating within a "learning community" as an evaluator yet member 
of the community 

• if evaluation recommendations are adopted, are evaluating one's own 
program directions 

• allocation of resources in a complex, messy, emergent design 
evaluation is a constant issue 
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QUESTIONS TO GUIDE THE EXPLICATION OF LOGIC 

(Schon, 1997) 

Is the design theory congruent with the espoused 
theory? 



Is the theory-in-use congruent with the design theory? 



Are the theories internally consistent? 



Is a given theory of action effective in the sense that 
the strategy yields the desired outcome? 



Panel L: Tracking the Theory of Change: A Moving Target 

NISE- 1999 Zoe A Barley, SAME! - We stern Michigan University 
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Exhibit 2. Concept Map and Major Evaluation Questions: Schools of the 21st Century 
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EVALUATING SYSTEMIC REFORM: A COMPLEX ENDEAVOR 



Iris R. Weiss 
Horizon Research, Inc. 



At first glance, evaluation of systemic 
reform efforts is a lot like evaluation of any 
reform effort, where information is collected, 
analyzed, and interpreted in order to: (1) 
improve the project and/or (2) assess its 
impact. But the fact that systemic reform 
efforts are charged with aligning the many 
components of the education system that 
make the evaluation efforts substantially more 
challenging. 

Leaders of mathematics and science 
education reform efforts, both “systemic!’ and 
“non-systemic,” typically begin by laying out 
the needs they are addressing. They use their 
understanding of the system, in concert with 
their knowledge of what works best in a 
particular context, to design a set of 
interventions. Evaluation can begin very early 
in the reform process, with evaluators using 
their knowledge from research and prior 
experience to critique the design of the 
initiative and suggest areas in need of 
refinement, but relatively few projects use 
evaluators in this role. More typically, once 
the reform begins, evaluators monitor the 
quality of the implementation and seek 
evidence of its impact. By sharing the results 
of the evaluation with key project 
stakeholders, evaluators hope to contribute to 
continued improvement of the reform design 
and implementation. By documenting areas of 
impact (and lack of impact), evaluators hope 
to inform future policy and program 
decisions. , 

In reforms of an individual component of 
the education system, evaluators can use time- 
honored evaluation strategies with reasonable 
confidence. In contrast, in evaluating systemic 
reform efforts, evaluators often feel like we 
are making it up as we go along. The 
following sections describe the (relative) ease 
of evaluating a traditional intervention, and 
the increased complexity involved in 



evaluating multi-faceted systemic reform 
initiatives. 

Evaluating “Traditional” Reform Efforts 

For many years, mathematics and science 
education reform efforts focused on individual 
components of the system. For example, 
summer institutes were offered at colleges and 
universities to help in-service teachers deepen 
their content knowledge. Evaluations of such 
efforts looked at the quality of the institutes in 
relation to that goal (e.g., to determine if the 
content was important for teachers to 
understand and was presented in a way that 
was accessible to those teachers). They found 
out from teachers whether the institutes had 
impacted their feelings of preparedness. On 
occasion, they might use pre- and posttests to 
determine more objectively if teacher 
knowledge had increased. The evaluators 
might have used their experience with similar 
programs to critique the design ahead of time, 
and they would likely provide formative 
feedback as the project unfolded. However, 
since interventions typically were not 
attempting (or likely) to change the 
educational system beyond impacting teacher 
preparedness, there was no reason for 
evaluation efforts to focus on the larger 
system, or for the evaluators to provide 
feedback in regard to changing other parts of 
that system. 

Evaluating “Simple” Systemic Reform 
Efforts 

By definition, systemic reform efforts 
address multiple components of the education 
system. As a result, the evaluator’s job 
expands beyond tracking the implementation 
and impact of the specific project activities to 
looking at other elements of the system that 
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may affect the extent to which the project 
achieves its goals. Since everything in a 
system is connected to everything else in that 
system, it is often unclear where to draw the 
line in deciding what is and is not to be 
addressed in the evaluation, especially when 
projects funded as systemic are only 
minimally so. 

What is clear is that even “simple” 
systemic reform efforts introduce 
complexities for evaluation. Consider the case 
of the National Science Foundation’s (NSF) 
Local Systemic Change through Teacher 
Enhancement (LSC) initiative, which 
emphasizes professional development around 
exemplary instructional materials. Assume 
that in a particular district the key needs have 
been identified as: 

• Many elementary teachers lack science 
content knowledge, and 

• Elementary teachers are not prepared to 
use the instructional materials the 
district has chosen. 

The literature on systemic reform led the 
project staff to: (1) design professional 
development programs in which the 
elementary teachers could learn content and 
pedagogy in the context of the designated 
instructional materials; (2) work with 
principals to ensure that they do not derail the 
reform efforts; and, (3) provide support for 
teachers as they attempt to implement the new 
instructional materials in their classes. 

Obviously, the fact that the project 
activities address more components of the 
system than simply teacher content 
knowledge increases the scale and complexity 
of the evaluation, which now must focus on 
several areas. Less obvious is the fact that the 
evaluation must now also consider parts of the 
system not directly addressed in the project 
plan. For example, in critiquing the project 
design, evaluators might suggest the need for 
a materials management center, noting that 
other elementary science projects they have 
evaluated have floundered when teachers had 
to deal with re-supplying consumables. 
Similarly, in looking at impact, the evaluation 
would need to consider issues of 



sustainability, i.e., whether there was a system 
in place for the district to continue the 
professional development after the grant. 
Consequently, not only will the evaluation of 
a systemic reform effort require resources 
beyond those needed to evaluate a similar size 
traditional reform effort, but it will also 
require evaluators with a broader and deeper 
understanding of educational systems. 

Evaluating “Complex’.’ Systemic Reform 
Efforts 

Many systemic reform efforts are 
considerably more complex than the LSC 
example, and the ensuing challenges for 
evaluation increase correspondingly. Let’s 
look at how increasing the complexity of a 
systemic reform effort complicates the 
evaluation, both in the “design critique” stage 
and in the evaluation of the project’s 
implementation and impact. 

Design Critique 

The LSC solicitation specified that the 
reform was to emphasize professional 
development. At the same time, projects were 
asked to situate that professional development 
in a systemic context so that other aspects of 
the system did not negate the impact of the 
professional , development. (For example, if 
the district assessments were not consistent 
with the content and approach of the new 
instructional materials, teachers would be less 
likely to change their instruction.) Once the 
subject and grade range for the intervention 
was designated, and the goals identified, the 
evaluation would be targeted to that subject, 
that grade range, and those specific goals. The 
evaluators did not need to focus on whether it 
would have been better to spend project 
resources on pre-service education, revising 
the high school science or mathematics 
curriculum framework, or any of a myriad of 
interventions that were outside the scope of 
this initiative. 

In a more complex systemic reform effort, 
leaders’ attempts to “understand the system” 
could well generate a lengthy list of needs — 
for professional development at every grade 
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range; for an articulated K-12 science, 
mathematics, and technology curriculum; for 
improved instructional materials; for 
assessments aliped with reform; for 
administrator support; for higher expectations 
for all students; for community support; for 
replacing antiquated laboratories; for 
improving pre-service preparation. 

If systemic reform theory were well- 
developed, project staff would have some 
direction for deciding how much priority to 
give each need and in what sequence, and 
evaluators would have a sound basis for 
critiquing the project desip. But systemic 
reform theory is exceedingly thin, specifying 
overall goals, but providing little pidance on 
how to go’ about meeting those goals. There is 
a bewildering array of options for 
intervention, and often as many opinions 
about the most effective strategies as there are 
stakeholders. In a simpler systemic reform 
effort, the evaluator’s critique would likely 
help the project improve its desip. In a more 
complex endeavor, the evaluator’s voice may 
sjmply add to the confusion. 

Implementation Evaluation 

Every systemic reform initiative 
eventually settles on a course of action, a 
subset of the seemingly infinite number of 
activities which have the potential to address 
the system’s needs. In most cases, however, 
that subset is still much more than the 
evaluators can possibly monitor within the 
time and resources available. Typically, the 
resources devoted to evaluation in systemic 
reform efforts would be more appropriate for 
investigating the quality and impact of two or 
three components, not the dozen or so that are 
generally included. 

In the best circumstances, project staff 
help decide where to target evaluation 
resources both by being clear in 
communicating the project strategy and in 
specifying the programmatic decisions that 
will need to be made. In the more typical case, 
project staff want an in-depth look at 
everything, or the various stakeholders are 
interested in different parts of the initiative or 
different parts of the system. The process of 



reaching consensus requires extended, 
sometimes seemingly endless negotiations. 
Eventually data collection begins, whether 
according to an apeed-upon evaluation 
desip or more haphazardly, simply because 
the clock is running and evaluators need to 
have something to report. At this stage, it is 
possible to pretend that this is a typical 
evaluation, proceeding to review project 
documents, observe project events, interview 
participants, talk with key stakeholders in the 
system, etc. 

Typically, the plot thickens when it is 
time to provide formative evaluation 
feedback. In a simpler project it might be 
appropriate to report results only to the 
Principal Investigator (PI), but the 

collaborative approach inherent in complex 
systemic reform efforts suggests the need to 
communicate with a larger group. In fact, 
even if the systemic reform has a single 
dominant leader, it is a good idea to 
communicate evaluation findings more 
widely. Intentionally or otherwise, the PI may 
filter the information, put a “spin” on it, or use 
the results in some other way that seems 
counter to the best interests of the initiative. 

To avoid this problem, we have learned to 
provide feedback in writing simultaneously to 

the project’s entire management team, 
typically 3-10 people, leaving it up to them to 
decide who else should get the report and 
when. 

In any evaluation, but especially in 
complex systemic reform efforts, there is an 
additional problem in finding the appropriate 
point in the balance between simply report 

findings versus making recommendations for 
a major redesign to increase the likelihood of 

impact. At one extreme, the project is 
deprived of the insights of skilled, 
experienced people who understand the 
project goals and context deeply and well. At 
the other extreme, those same skilled, 
experienced people could be perceived as 
taking over the project, and in turn, evaluating 
themselves! 

A related challenge is presenting 
information in a way that will help the project 
move forward. In the ideal, project staff 
would have both the capacity and the will to 
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make use of evaluation feedback to improve 
the project design and implementation. The 
reality is, unfortunately, very far from that 
ideal, especially in complex systemic reforms. 

We have found a number of reasons why 
projects are unable to make mid-course 
corrections, even in the face of compelling 
evaluation results. In developing the initial 
reform plan, project leaders often had to 
negotiate with diverse stakeholders, and they 
may be concerned that any changes will 
jeopardize the sometimes fragile coalition that 
was established at that time. Alternatively, 
project staff may know how to do what they 
proposed initially, e.g., high-quality 
professional development, but not know how 
to go about whatever it is the evaluation 
results suggest they do instead, especially if 
the recommendations involve efforts in the 
policy arena. Finally, the turf issues that are 
present in any initiative seem to increase 
exponentially with the number of players; 
sometimes formative evaluation feedback in a 
large systemic initiative becomes just another 
round of ammunition for the political battles. 

Impact Evaluation 

Funders have their own constraints, 
including the need to provide evidence of 
program effectiveness to Congress or other 
policymaking groups. Unfortunately, this 
need often translates into pressure for the 
initiative to seek evidence of impact when the 
reform efforts are just beginning to be 
implemented. At some point, typically long 
before evaluators think it is reasonable to do 
so, the evaluation will begin to focus on 
evidence that the initiative has had its 
intended impacts on teachers and students. 
Again, there is likely to be far more to look at 
than is feasible with the available resources. 
The problem is complicated greatly by the 
lack of appropriate outcome measures, a 
situation that is even more problematic for 
systemic initiatives than for traditional reform 
efforts because of the need to demonstrate 
impact in order to justify the large 
expenditures. 

One difficulty is that systemic reform 
includes alignment of policy in support of the 



reform vision, but in most cases “alignment” 
has not yet been defined in measurable terms. 
Another difficult, even in areas where there 
are existing instruments, is the scarcity of 
measures that are simultaneously valid, 
reliable, and feasible on a large scale. Surveys 
and multiple-choice tests are open to criticism 
on validity grounds; classroom observations 
and performance assessments are open to 
criticism on reliability grounds, and so on. 
Finally, there are often problems in study 
design that threaten the credibility of the 
results. Unlike small-scale research projects, 
major systemic reform efforts rarely use 
random assignment of teachers and students 
to treatment groups, and appropriate 
comparison groups are difficult to find. 2 

Because of these and other complexities, 
some researchers have suggested that the 
question of impact on student achievement be 
addressed through carefully controlled 
research efforts rather than as a part of the 
evaluation of professional development 
interventions. 3 The reasoning is that if it can 
be demonstrated that students leam more 
when teachers do more X and Y, then 
evaluation of a particular reform effort could 
determine if teachers are in fact doing more X 
and Y, and leave it at that. Politics aside, that 
advice might be heeded as a more efficient 



2 

Using “matched” districts/teachers/students 
might work if you chose the “right” matching 
variables, but the primitive state of systemic 
reform theory does not inspire confidence in this 
regard; it is entirely too likely that some 
unmeasured aspect of the context will make the 
two groups non-comparable. Choosing as yet 
“untreated” teachers/students in the intervention 
districts and schools helps avoid that problem, but 
introduces the possibility that these groups were 
influenced by policy reforms associated with the 
initiative. 

3 See, for example, George Hein, “The Logic of 
Program Evaluation: What Should We Evaluate in 
Teacher Enhancement Projects?” in Reflecting on 
Our Work: NSF Teacher Enhancement in K-6 
Mathematics (S. N. Friel & G. W. Bright, Editors). 
Lanhom, MD: University Press of America, Inc., 
1997. 
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use of evaluation resources. But the final virtually no chance that politics will be set 

complexity of evaluating systemic reform is aside for very long, 
that, as in the reforms themselves, there is 
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Introduction to Breakout Session I Question Summary 

Each panel was followed by a Breakout Session. Participants 
were assigned to small groups of ten to twelve, led by a 
facilitator, in a discussion of three questions and other issues 
raised by the presenters. Each set of three questions was 
developed by the organizers of the Forum. At the beginning of 
the Breakout Session, participants were asked to write their 
responses to each of the three questions on index cards. The 
comments that the participants ivrote were used to begin the 
small group discussions. The index cards were given to two 
people, who provided a synthesis of the conference; comments 
on the index cards were incorporated into their comments. 
Responses to the first question are summarized here to 
provide examples of participants' comments. 
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Breakout Session I: Defining the 
Problems of Evaluating Systemic Reform 

Participants’ Comments: 

Q: What are the main issues that need 
to be considered in evaluating 
systemic reform? 

Participants raised some 
important points about what needs to be 
considered in evaluating systemic reform. 
Although a number of issues were 
identified, a few were repeatedly cited by a 
number of the 175 participants who 
responded. These are listed in order, from 
issues of greatest concern to those less 
frequently raised. 

1 . Clear definition of system, systemic 
reform, and relevant components. More 
than 20% of the participants noted the 
importance for being clear about what is 
being evaluated. This requires defining 
what system is being reformed, 
including how its boundaries are 
defined, and determining what is within 
the system and what should be 
considered outside the system. Several 
of the participants thought it important 
to clearly identify the system 
components, what is meant by 
components, and what the 
interconnections among components 
are. One participant noted the need for a 
common framework that can be used to 
analyze different components, including 
curriculum and policy. Others 
emphasized the need to specify clearly 
what systemic reform is and what the 
parameters are for a system that is acting 
systemically (e.g., How much coherence 
is enough? When is a curriculum 
standards-based? When is a system 
serving all students equitably?) A few 
extended this thought by asking about 
how to conceptualize systemic 
evaluation as a complex system, in and 
of itself, that is in turn embedded in a 
complex system. 



2. Student achievement. More than 10% 
of the participants identified the 
measurement of student achievement and of 
systemic reform’s impact on student 
achievement as an important issue. A 
typical comment was “Bottom line, 
students-how does [systemic reform] 
work?’ 

5 . Ways to work with dynamic and 
complex systems. More than 10% of the 
participants raised the issue of studying 
education systems that are dynamic, change 
over time, and are complex. Others raised 
questions about managing the scale of a 
large system and selecting the most 
appropriate operational variables to 
evaluate the entire system. One participant 
indicated that the evaluation design for 
systemic reform has to be responsive to the 
dynamic nature of the system. 

4. Means for determining attribution and 
cause and effect. About 5% of the 
participants noted that attribution and 
causality were important issues. A few 
questioned whether it was necessary to 
judge attribution. One participant indicated 
the possibility of assessing only a fractional 
part of the impact by analyzing the variety 
of connections between inputs and outputs. 
Another participant felt that attributing 
effects to an initiative would require 
studying the cognitive processes of students 
and teachers in the learning process. 

5. Responding to and identifying 
audiences and stakeholders. More than 5% 
of the participants made some reference to 
the relationship of the evaluation and its 
findings to appropriate audiences. 

Participants sought clarification on what 
would be meaningful to different audiences 
(e.g., initiative personnel, funders, and 
policy makers), how should results and data 
be interpreted, and how should feedback of 
results be varied. A few participants 
questioned how to evaluate the “buy-in” by 
stakeholders and the commitment of all 
constituencies to achieving the desired 
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outcomes. One participant raised the 
question about how stakeholders should 
be viewed and whether they should be 
considered participants. 

6. Other questions raised by more than 
one of the participants were: What are 
the critical indicators of success that 



should be measured? How do we convey 
the importance of creating logic models and 
reconciling the research desip with the 
theory/logic of systemic change? How can 
reform be evaluated hilly when there is a 
misalignment between the existing 
assessments and the goals and objectives of 
reform? 
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Panel II: Models and Approaches to Evaluation of Systemic Reform 

Panel Papers and Authors: 

Critical Elements of an Evaluation of Systemic Reform 

Patrick Shields, Andrew A. Zucker, and Nancy E. Adelman, SRI International 
Evaluators’ Roles: Walking the Line Between Judge and Consultant 
Jeanne Rose Century, Education Development Center 
Assessing Student Outcomes 

Norma Davila, University of Puerto Rico 
Understanding the Value of NSF’s Investments in Systemic Reform 
Mark St. John, Inverness Research 



Discussion Summary and Commentary: 
Models and Approaches to Evaluation 
of Systemic Reform 

Norman L. Webb 

Panel II presented considerations and 
issues related to models and approaches for 
evaluating systemic reform. The four speakers 
discussed critical elements of an evaluation of 
systemic reform, the evaluator’s role, 
assessing student outcomes, and 
understanding the value of NSF’s investments 
in systemic reform. 

Patrick Shields, a lead researcher of the 
SSI program evaluation conducted by SRI 
International, presented the conceptual model 
the evaluation team developed to specify the 
key components of an educational system that 
need to be reformed in concert. SRI based its 
model on the conceptualization of systemic 
reform as specified by Smith and O’ Day 
(1991) and others. In the shape of a pyramid, 
with student outcomes at the apex and 
standards and institutional collaboration and 
leadership at the foundation, the model is 
deceptively simple. Based on clear standards 
for what students should know and be able to 
do, and with the support of the key leadership, 
states and districts must align policy, build 
capacity to provide schools and teachers with 
needed human and material support, 
restructure incentive systems, and build 
professional and public support for the reform 
agenda. These actions in turn are meant to 



provide the support needed to help increase 
teachers’ capacity to implement the reform 
vision with access to appropriate materials, 
within schools organized to support their 
efforts, and with the support of parents and 
community. Such synergy will produce 
reformed classrooms and increased student 
learning. 

The evaluation used quantitative data 
gathered annually from SSI principal 
investigators, repeated site visits to each SSI, 
and reanalysis of data sets gathered by many 
of the SSIs. Shields noted what proved most 
useful from their approach to the evaluation. 
Their evolving model of systemic reform was 
helpful in guiding their inquiry and in 
facilitating cross-case comparisons. Twelve 
in-depth case studies of total state systems 
effectively provided detailed descriptions and 
analyses of the progress of the individual 
SSIs. The evaluation team identified eight 
specific state strategies that aided in the cross- 
site analysis and in assessing strengths and 
weaknesses of sites’ focusing on specific 
components. Shields also noted what they 
attempted that worked less well. An attempt 
to develop a common survey to compare 
classroom data was found to be too difficult. 
This forced the SRI team to rely on case study 
and state-selected evaluators’ data, which 
varied in quality, to decipher classroom 
impact and to assess student learning. 

Reducing the complex stories of the 26 SSIs 
into a concise summary, as requested by NSF, 
and ranking the process of individual states 
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was found to be daunting. In the end, a report 
was produced without state-by-state rankings. 
It proved impossible to identify a small 
number of models of systemic reform. 

Instead, the SRI evaluators deferred to 
identifying the multiple strategies used by the 
SSIs and in the different contexts. 

Jeanne Rose Century, a researcher at the 
Education Development Center, makes the 
case that as systemic reform calls for new 
roles for school administrators and teachers, 
the evaluator’s role also is subject to change. 
As the trajectory of the field of evaluation 
during this century has evolved, so has the 
concept of evaluator from that of a technician 
to a more expanded role including advisor, 
collaborator, and coach. Century identified 
some specific roles for evaluators of systemic 
reform. Because of systemic reform’s 
complexity, evaluators need to serve a 
multiplicity of roles and be versatile. The 
dynamic nature of systems forces those who 
are judging the value of reform to be flexible 
and to easily move in and out of specific 
roles. She argues that the goals for systemic 
reform fall within two domains: (1) improving 
educational practices and outcomes, and (2) 
building capacity. The success of systemic 
reform depends on instructional change and 
sustaining improvement through on-going 
reflection and reevaluation. An evaluator who 
identifies insufficient capacity within the 
system for it to achieve its goals may be in a 
position to provide technical assistance and 
may even be requested to do so by project 
leadership. Whereas in the past the tenet of 
independence and objectivity would inhibit an 
evaluator from providing technical assistance, 
the evaluator’s understanding of the system 
may mean that he or she is in the best position 
to assist. The role an evaluator serves is 
shaped by many factors. In a systemic reform 
context, an evaluator’s responsibility to a 
specific program or its staff may appropriately 
take precedence over the traditional 
constraints imposed on scientific objectivity. 

Norma Davila, University of Puerto and 
evaluator of the Puerto Rico Statewide 
Systemic Initiative, discussed alternative ways 
to measure student academic achievement 
within the new parameters of systemic 
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educational reforms. Achievement of 
challenging academic standards, as indicated 
by improved student academic achievement, 
is a central focus of the Puerto Rico SSI. The 
evaluation design of the Puerto Rico SSI was 
based on a participatory-research approach for 
evaluation, in general, and to assess student 
academic achievement, in particular. 
Evaluators triangulated their findings using 
multiple quantitative and qualitative methods. 
The evaluation employed three levels of 
assessment. Teachers trained in authentic 
assessment strategies used these assessment 
results to modify their practices and to 
monitor improvement in student academic 
achievement. The SSI staff developed a series 
of standards-based tests in science and 
mathematics closely aligned to classroom 
practices that were used to measured change 
in student achievement prior to teachers’ ' 
participation in the SSI training and after 
completion of this training. An external 
measure was used as the third level of 
assessment. National Assessment of 
Educational Progress (NAEP) tests were 
adapted and translated to compare 
performance of students in schools 
participating in the SSI with those not 
participating. 

Over time and as the SSI evolved, a more 
credible assessment was needed. The SSI 
staff, in an alliance with the College Entrance 
Examination Board (CEEB), developed 
assessments based on items from NAEP and 
the Third International Mathematics and 
Science Study (TIMSS) to measure 
achievement gains over one year. These tests 
were administered to all students from 377 
schools in grades 4,8, and 11. The scores 
from these assessments were scaled using the 
TIMSS scales so that the scores could be 
compared to international benchmarks. 

Similar, but not identical, assessment items 
were used to provide professional 
development to teachers on students’ common 
misconceptions and how can they be 
corrected. Teachers in these sessions used the 
assessment items as a basis for examining 
their own performance. In addition to student 
assessment results, enrollment in higher 
education of students from SSI schools also 
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was being used as an outcome indicator. 
Although the SSI was comfortable with the 
original three levels of testing, the national 
exposure of the TIMSS reports since 1997 
was a deciding factor in the decision to use an 
externally developed test. Using the publicly 
released items from NAEP and TIMSS was 
less expensive than developing their own 
tests, but required the expertise of CEEB. 
Davila closed by observing the need for 
common metrics of student academic 
achievement that could be used across SSI 
sites. 

Mark St. John, President of Inverness 
Research, drawing upon his training as a 
physicist, began by defining how he uses the 
terms evaluation and systemic reform. 
Evaluation refers to figuring out the value, the 
benefits, and the contributions that accrue 
from the public investment that is being made 
in the systemic initiatives. According to the 
theory of systemic reform, the instruction 
students receive-the quality of their learning 
experiences in schools-is directly shaped by 
the system-the political and institutional 
context-that surrounds classrooms. Any 
successful attempt to improve the quality of 
instruction must assume a systemic 
perspective in design and implementation. In 
education, as in other complex endeavors, 
there are many necessary yet not sufficient 
system supports that must be present if the 
system is to function well. Another important 
aspect of systemic reform is that the people 
who do the work of improving the system 
must be those living and working within the 
system. This means the people within the 
system must have the capacity and expertise 
to bring about intelligent change. Here 
capacity must consist of internal skills and 
knowledge, as well as access to external 
resources and expertise. 

St. John identified “accountability 
misconceptions” that people have about how 
an education initiative should be judged that 
need to be confronted by evaluators. One is 
the “last input is the only input,” based on the 
conception that the quality of a teacher can be 
assessed by measuring the achievement of the 
teacher’s students. This ignores all of the 
educational and other experiences students 



had prior to being in that teacher’s classroom. 
Another misconception is that improving only 
teacher preparation programs will improve the 
quality of teachers. This misconception also 
ignores the years of prior schooling the pre- 
service teachers have had. A third 
misconception is that the quality of a program 
or school can be judged by how high the test 
scores are. It is more accurate to identify good 
schools by those that add significant value to 
the knowledge and skills that students bring to 
schools. A fourth misconception is that a 
program, such as the SSI, is the only program 
in existence and can be studied in the absence 
of inputs from any other program. As such, 
clear effects can be attributed to each specific 
program. Evaluators need to be critics of 
unexamined and incorrect understandings. 

St. John explained in more detail the 
difficulties that exist in establishing the value 
of NSF’s investment in systemic reform. One 
difficulty is the scale of the investment in 
relation to total education budget in the 
systems seeking change. Another is that there 
are many other factors that contribute to 
improved student learning. A third difficulty 
is that the impact of the investment on 
learning of any one student is very small by 
the time the investment is channeled through 
the many layers of the system from 
administration, curriculum, schools, teachers 
to classroom activities. A fourth difficulty is 
that the actual time required for NSF’s 
investment to have an impact might be longer 
than the funding period of five years. 

Based on a study for the National 
Academy of Sciences, St. John developed a 
model depicting the relationship of key 
variables that could be used to judge the 
probability that a SSI would succeed. In this 
study, the single most important factor was 
the quality, expertise, commitment, and 
political power of the leadership. Other 
important factors were the knowledge and 
expertise that exist within the reform itself 
(design), policy and reform infrastructure, 
discretionary funding that can be allocated 
specifically towards reform, and the political 
and public demand for reform. For systemic 
reform to be effective, these factors have to 
overcome barriers that include scale of the 
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system, political “cross currents,” severe 
financial problems, instability and turbulence 
in the system, other reforms, and competing 
priorities. Based on this model, what NSF 
should be held accountable for should be the 
degree to which its investments build the state 
or district capacity for initiating, and 
sustaining, reform. NSF should not be held 



accountable for what a state or district does 
with this capacity. He advises that it is better 
to document the contributions the systemic 
initiatives are making to increase the capacity 
of teachers and others than to argue that they 
are directly causing increased student 
achievement in the short term. 
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CRITICAL ELEMENTS OF AN EVALUATION OF SYSTEMIC REFORM 



Patrick M. Shields, Andrew A. Zucker, and Nancy E. Adelman 

SRI International 



In 1990, the National Science Foundation 
(NSF) launched the Statewide Systemic 
Initiative (SSI) program to help states 
undertake comprehensive and coordinated 
reforms of mathematics, science, and 
technology education. Between 1991 and 
1993, NSF signed five-year cooperative 
agreements with 25 states and the 
Commonwealth of Puerto Rico to carry our 
standards-based systemic reform throughout 
their jurisdictions. 

To assess the extent to which states have 
undertaken the kinds of changes envisioned 
by NSF, and to examine the efficacy of 
different SSI strategies, the Foundation 
contracted with SRI International to conduct a 
national evaluation of the program. This paper 
reviews the framework the evaluation team 
used for assessing the progress of the SSIs, 
outlines the evaluation methodology, and 
reflects on a number of challenges involved in 
evaluating systemic reform. 4 

A Framework for Assessing State 
Strategies 

The concept of systemic reform has been 
outlined by Smith and O’ Day (1991) and 
elaborated numerous times since then (see 
Clune, 1993; Fuhrman & Massed, 1992; 
Fuhrman, 1993; Vinovskis, 1996). The 
essence of the concept is that ambitious 
standards for student learning should form the 
basis for the alignment of all policies, 
practices, and resources throughout the 
educational system. Fundamental to the 
concept is that ambitious goals apply to all 
students, not just those destined for 
professional careers (O’Day & Smith, 1993). 

To guide the evaluation of the SSI 
program, the evaluation team developed a 



4 This paper is based on a series of reports 
produced for the evaluation, references to which 
can be found at the end of this paper. 



conceptual model of systemic reform, shown 
in Exhibit 1, specifying the key components 
of the educational system that needed to be 
reformed in concert. The Exhibit shows SSI 
activities or investments moving in two 
related but distinct channels. One set of 
investments has been made for activities 
relatively close to students and teachers, 
including support by the SSIs for professional 
development. A second set of activities has 
focused on activities more distant from 
classrooms, such as the development and 
dissemination of state curriculum frameworks. 
Because systemic reform aims to . change both 
student outcomes and the education system 
itself, both sets of activities have been 
important. However, different SSIs have 
supported widely varying combinations of 
strategies to effect changes at different levels 
of the education system (see Zucker & 

Shields, 1997). Other key features of the 
model are as follows. 

The Top of the Model: Students, Teachers, 
Classrooms, and Schools 

By placing student outcomes at the apex 
of the figure, the model emphasizes that the 
overarching goal of systemic reform is to raise 
student achievement, increase students’ 
interest and enrollment in challenging 
courses, and otherwise improve education 
outcomes for young people. Improvements in 
student learning rest on improved classroom 
experiences. Such experiences are 
characterized by active student engagement 
with real-world scientific and mathematical 
problems, critical inquiry into a limited set of 
topics, ’and opportunities for actual scientific 
thinking and discourse (CSMEE, 1997; 

Project 2061, 1993). In contrast to the typical 
American school, classrooms that provide 
such experiences are marked by less teacher- 
directed instruction, more student-student 
interaction, the flexible organization of space 
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and time in line with the specific learning 
goals at hand, and regular constructive 
feedback to students based on their 
performance on actual mathematics and 
science tasks (That-p & Gallimore, 1989). 

The creation of such classrooms, the 
model continues, calls for teachers with a 
new set of skills, resources, and knowledge. 
Teachers must have a thorough command of 
their subject matter-an especially 
challenging task in mathematics and science, 
particularly at the elementary level (Cohen 
& Hill, 1997). Teachers must understand 
how students learn and how to structure 
learning opportunities to capitalize on 
students’ knowledge and learning styles 
(Darling-Hammond, 1996; National 
Commission on Teaching and America’s 
Future, 1996). Perhaps most importantly, 
they must believe that all their students can 
master challenging content. 

Beyond content knowledge and 
pedagogical skills, teachers-and their 
students-must have access to appropriate 
tools and instructional materials. They need 
classroom technology (e.g., lab equipment, 
graphing calculators) and high-quality 
instructional materials. Access to 
appropriate curricula is particularly 
important because the challenge of creating 
inquiry-centered classrooms is already so 
daunting that without good curricula, 
teachers are faced with the prospect of 
creating their own materials while 
simultaneously struggling to change their 
own practice (Adelman, 1997; Zucker, 

1997). 

The provision of needed material 
resources, as well as the time teachers need 
to plan and assess the teaching and learning 
in their classrooms, calls for associated 
‘changes in the culture and organization of 
schooling. Fullan (1996) uses the term “re- 
culturing” to refer to fundamental shifts in a 
school away from traditional norms 
structured by bureaucratic roles to a 
philosophy where student attainment of high 
standards is the central concern of all staff. 
Restructuring refers to the reorganization of 
standard operating procedures, especially 
time and the use of space, to promote 



student and teacher learning. From this 
perspective, schools that are supportive of 
teachers creating effective classrooms are 
characterized as learning organizations. 
Teachers have time away from children to 
interact and reflect with their peers; 
resources are allocated to optimize learning; 
and the scheduling of class periods as well 
as the grouping of students is flexible and 
driven by learning goals (Elmore & 
Associates, 1990). Such schools also reach 
out to others because they require the 
support and buy-in of parents and the local 
community. Parent and community support 
is especially important when fundamental 
shifts in classroom practice are envisioned, 
as promoted by systemic reform (Shields & 
Knapp, 1997). 

The Base of the Model: Districts, Regipns, 
and States 

To support reforms at the school and 
classroom levels on any scale requires 
coordinated and coherent reforms at the 
levels of states, regions, and districts. Of 
paramount importance is the alignment of 
policies at the state and local levels. The 
misalignment of traditional basic-skills- 
oriented, norm-referenced tests with new 
and ambitious goals for student learning was 
one of the fundamental concerns of the 
proponents of systemic reform (Smith & 
O’Day, 1991). More coherent and robust 
policies are needed to send consistent 
messages to educators and the public about 
what is valued. Beyond assessments and 
frameworks, there are a host of policies 
under the control of either the state or local 
districts, depending on political traditions, 
that influence who ends up in classrooms, 
how teachers teach, and what support 
teachers receive. 

Beyond policy, there is the need for 
building an infrastructure at the district, 
region, and state levels that will provide 
human and material support required for 
school and classroom reforms. The task 
faced by district and state administrators is 
no less challenging than that confronting 
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classroom teachers, and the “system” that 
holds together district and state efforts is just 
as disjointed as the typical school. Systemic 
reform calls for districts and states to 
jettison their traditional role as regulators of 
local practice and assume the new role of 
technical assistors to schools. They have to 
understand, and be willing to address, the 
resource allocation, professional 
development, and organizational issues 
raised by the reforms (see Spillane & 
Tompson, 1997). 

A third factor that districts and states 
need to address is incentives for reform. 
Changing practice requires extra time and 
effort by teachers-time for learning, time 
for redesign-and it entails some risk, 
including the possibility of inadequate 
performance; negative reactions from 

colleagues, students, or parents; or lower 
achievement. So teachers must be highly 
motivated to undertake changes; they must 
have compelling reasons for taking on the 
work and the possible risks. Persuading 
large numbers of teachers and school 
administrators to engage in the work of 
reform requires the alignment of existing 
incentives with reforms, the elimination of 
disincentives, and sometimes the creation of 
new or additional incentives. Guidance 
mechanisms such as state standards, state 
and local assessments, and personnel 
evaluation criteria are all critical parts of the 
incentive structure affecting classroom 
practice. Many reformers also call for strong 
accountability systems that include public 
release of student outcomes and clear 
rewards and sanctions (David, 1990). 

The fourth reform task at the state and 
district levels involves building professional 
and public support for the reform agenda. 
Systemic reform requires widespread public 
acceptance and support. The public may 
sometimes appear apathetic about 
instructional reforms, but changes in the 
classroom that depart from the public’s 
conceptions of “real” school will quickly 
galvanize parents if their support has not 
been obtained in advance. In democratically 
controlled school systems with weak 
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professional structures, classroom practice is 
not determined solely by professionals. 
Instead, teaching practice is subject to close 
public scrutiny by parents and community 
members, and changes in practice require 
public acceptance, as well as formal 
approval by local boards. 

A well-specified vision of student 
learning goals forms the basic premise of all 
versions of systemic reform. The argument 
is simple: coherence and alignment in the 
educational system must be guided by a 
shared understanding of what we want 
students to learn. In the early writing on 
systemic reform, this vision was likely to be 
specified in curriculum ffameworks-again 
based on the experience of California in the 
1980s (Smith & O’Day, 1991). Throughout 
the mid-1990s, as states tried out many of 
the ideas of systemic reform, curricular 
frameworks were replaced by state standards 
as the key vehicle for communicating a 
vision of high-quality instruction and 
learning. In fact, the term systemic reform 
was often replaced in the literature by the 
term “standards-based reform” (David, 
Shields, Young, Glenn, & Humphrey, 1997). 

High-level leadership and collaboration 
among key institutions at both the state and 
local levels are required to help assure the 
legitimacy of the reform vision and thus its 
political power to guide shifts in policy and 
practice, as well as to motivate the 
concentration of resources needed for 
reform. The task of fundamental reform is 
both technical and political. Technically, it 
requires collaboration among the best 
minds-to set standards, realign assessment 
systems, restructure incentive systems, and 
build an appropriate infrastructure to support 
the reform effort. Politically, it requires the 
will to agree on a single set of learning 
outcomes, to establish appropriate 
accountability mechanisms, to build public 
support, and to gamer the necessary fiscal 
resources. Achievement of both the 
technical and political tasks of reform is 
impossible without the buy-in and support of 
the top leadership. 
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Systemic Reform: A Summary 

In summary, the model of systemic 
reform we have outlined follows a 
deceptively simple logic. Based on clear 
standards for what students should know and 
be able to do, and with the support of the 
key leadership, states and districts must 
align policy, build the capacity to provide 
schools and teachers with needed human and 
material support, restructure incentive 
systems, and build professional and public 
support for the reform agenda. These actions 
in turn are meant to provide the support 
needed to help increase teachers’ capacity to 
implement the reform vision with access to 
appropriate material, within schools 
organized to support their efforts, and with 
the support of parents and community. In 
such contexts, the argument continues, 
reformed classroom practice can occur and 
student learning will increase. 

The Evaluation Methodology 

The evaluation is based on data 
collected from a wide variety of sources. 
Three sources were most important. First, 
quantitative data were gathered annually 
from the principal investigators in each SSI. 
In addition, the evaluation team conducted 
repeated site visits in every SSI. Finally, 
secondary data analysis included careful 
study and, in some cases, reanalysis of data 
sets gathered by many of the SSIs as part of 
their ongoing efforts to assess progress 
toward reaching their goals. 

The evaluation included a set of 12 
detailed case studies, for SSIs in Arkansas, 
California, Connecticut, Delaware, 

Kentucky, Louisiana, Maine, Michigan, 
Montana, New York, Vermont, and 
Virginia. The time on-site in each case study 
state averaged about 50 person-days. Site 
visiting took place both during the school 
year and in the summer. More than two 
dozen districts in the case study states were 
described in detail by the evaluation team 
(but the written descriptions were not 
published), as well as more than three dozen 
schools. 



In the thirteen non-case-study states, the 
time on-site averaged about six person-days 
per SSI, and, again, a very large amount of 
information was gathered and analyzed 
about each of them. By design, these visits 
were briefer, were less frequent, and 
typically involved only a single evaluator. 
Written descriptions were not published; 
however, they averaged about 25 pages 
single-spaced for each of the non-case-study 
SSIs. 

In all the states, on-site visits were 
supplemented with telephone interviews, in- 
person interviews at periodic meetings of the 
SSI principal investigators and project 
directors, and extensive document analyses. 
Documents reviewed included monitoring 
reports about each SSI that were produced 
by Abt Associates, multiple documents 
written by each SSI (such as annual reports 
to NSF), and .reports of a number of 
evaluations conducted for specific SSIs. The 
latter were especially useful for developing 
two of the evaluation reports that focus on 
what selected SSIs learned about the impacts 
of their activities on teachers’ classroom 
practices and on student achievement. As 
necessary, information about particular 
states was also updated via telephone ore- 
mail to be sure information in each report 
was current. 

Reflections on the SSI Evaluation 

The evaluation of statewide systemic 
initiatives in 26 states presented a massive 
challenge: we were essentially attempting to 
track the progress of 26 distinct efforts to 
reform the entire system of education and to 
then make overall judgements of the success 
of those efforts in the aggregate. In 
undertaking this daunting task, we learned 
some lessons about what we did right and 
about where future evaluations can be 
strengthened. 

What We Did Right 

Our overall approach to the evaluation, 
building on a clear model of systemic 
reform, studying entire state systems, and 
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identifying specific reform strategies proved 
useful in meeting the challenges presented in 
this evaluation. 

A Model of Systemic Reform as an 
Evaluation Framework. We began the 
evaluation of the SSIs with a cruder version 
of the model presented earlier in this paper. 
This model served us well in identifying a 
set of components that should be included in 
system-wide reform and it helped us to 
develop hypotheses about the 
interrelationship of these components that 
could be tested against the empirical data 
from the sites. In general, we found that the 
model provided the appropriate categories 
and relationships among those to describe, 
analyze and assess the activities of the 
individual SSIs. The model also served to 
facilitate cross-case comparisons and to 
underscore areas where the SSIs, taken as a 
whole, had more or less impact on the entire 
system of education. 

Conducting In-Depth Case Studies of 
Entire State Systems. Systemic reform by 
definition is meant to involve an entire state 
system. Each of the participating systems 
presented a unique set of circumstances — 
not only in terms of demographics, 
geography, political culture and fiscal 
resources, but also in terms of ongoing 
reform efforts into which the SSI fit. 
Understanding the progress of the SSI 
required understanding the evolution of 
educational reform in the state as a whole. In 
short, the SSI could not be studied as a 
“project,” separate from other reform 
initiatives. Consequently, we chose to 
conduct in-depth case studies in a range of 
states in order to tell the full reform story in 
those contexts. These provide detailed 
descriptions and analyses of the progress of 
individual SSIs within the context of 
mathematics and science reform in their 
states. 

Identification of Specific State 
Strategies. Each of the SSIs’ total reform 
efforts consisted of a set of related change 
strategies. We identified eight of these: 

• Supporting teacher professional 
development 



• Developing, disseminating, or 
adopting instructional materials 

• Supporting model schools 

• Aligning state policy 

• Creating an infrastructure for 
capacity building 

• Funding local systemic initiatives 

• Reforming higher education and the 
preparation of teachers 

• Mobilizing public and professional 
opinion. 

Although we were not able to identify a 
small number of “models” or “types” of 
systemic reform with which to categorize 
and assess the SSIs, the identification of 
these eight strategies facilitated cross-site 
analysis and allowed us to assess the 
strengths and weaknesses of focusing on 
specific components of the system. 

Taken together, the use of a 
comprehensive framework, the focus on 
whole state systems, and the identification 
of specific SSI strategies allowed us to 
provide accurate pictures of individual 
state’s progress while making cross-site 
conclusions about the relative efficacy of 
different SSIs. 

The Jury Is Still Deliberating 

The evaluation did not meet every goal 
we set out for ourselves. In retrospect some 
of these goals may not have been realistic or 
even possible. Yet, as researchers and 
evaluators consider future work, it is 
worthwhile to reflect on some of these 
issues. 

Reliance on the SSIs for Statewide 
Impact Data. The goal of systemic reform 
is to improve teaching and learning. We 
found it infeasible to collect comparable 
classroom and student impact data across all 
26 states, the thousands of schools, and tens 
of thousands of teachers involved in the 
reforms. Early on in the evaluation, we 
developed and piloted a teacher survey that 
sought comparable classroom data. But we 
found it impossible to calibrate an 
instrument that was sensitive enough to 
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gauge the kinds of teaching practice we 
were interested in and that could be used 
across multiple SSIs. The development and 
implementation of multiple surveys for 
many different SSIs was deemed too 
expensive. As a result, we relied on the data 
from the case studies, which always 
involved a small subset of classrooms, and 
on data from the SSIs’ internal evaluations. 
Because of the unevenness of the internal 
evaluations, we were left with very uneven 
data on the classroom impact. 

Much the same can be said regarding 
student learning. NSF made a decision early 
on in the evaluation not to support a 
common assessment instmment across sites 
nor to require the use of a specific 
instrument. During the course of the SSIs, 
most states changed testing policies at least 
once and many never implemented a test 
designed to assess the type of learning the 
SSIs sought to promote. As a result, we were 
left with no data on student achievement 
from a number of SSIs and non-comparable 
data where they existed at all. 

We did end up producing reports on 
both student achievement and classroom 
impacts, but each was based on data from 
selected states and neither provided 
quantitative cross-site analyses using 
comparable measures of progress. 

Creating a Report Card. NSF invested 
heavily in the SSIs and in the evaluation of 
their progress. At different times during the 
evaluation, the Foundation sought a concise 
summary of the relative progress of the 
states. We found this task quite challenging 
as it required us to reduce the complex 
stories of 26 initiatives operating in very 
different reform contexts to simple scores or 
rankings. In a compromise with NSF, we 
ultimately produced an internal memo to the 
Foundation in which we scored the progress 
of the 26 SSIs for each of the eight strategies 
described earlier in this paper as well as for 
a set of crosscutting dimensions. The 
ultimate analysis resulting from this exercise 
will be published in a forthcoming paper 
(Adelman, Shields, & Zucker, 
forthcoming)-although the state-by-state 
ranking will not be made public. 



Whether a reliable “report card” could 
have been produced remains an open 
question. Efforts by the American 
Federation of Teachers, Education Week, 
and others to assess components of state 
reform efforts have proven quite unreliable. 

Identifying and Assessing “Models” 
of Systemic Reform. From the beginning of 
the evaluation, NSF encouraged the 
evaluation team to identify a small set of 
models of systemic reform. The goal was to 
identify a limited set of approaches to 
systemic reform and then to assess their 
relative efficacy. The argument was that 
while there was certainly more than one way 
to reform a system of mathematics and 
science education, there were certainly less 
than 26 ways to do so. 

In the end, we made a great deal of 
progress toward the goal of identifying 
models, but never quite reached it. As 
discussed earlier, we did identify a finite set 
of strategies for achieving systemic change. 
We identified which SSIs employed these 
strategies, we described which SSIs relied 
heavily or even primarily on one or another 
strategy, and assessed the degree to which 
individual SSIs succeeded in implementing 
an individual strategy. We were also able to 
assess the degree to which the SSIs used 
various strategies in more or less 
comprehensive approaches to full system 
reform. Yet, because each of the SSIs 
employed multiple strategies in different 
combinations and within very different 
contexts, we were not able to identify a 
small set of model approaches to systemic 
reform. 
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EVALUATORS’ ROLES: 

WALKING THE LINE BETWEEN JUDGE AND CONSULTANT 

Jeanne Rose Century 
Education Development Center 



The practice of educational evaluation has 
recent historical roots in the early part of the 
century when intelligence tests and the notion 
of “scientific management” of education were 
first developed. The principles that grew out 
of this movement , such as using carefully 
crafted tests to find “scientific” solutions to 
educational problems, exerted influence on 
what today has become the educational 
evaluation enterprise. Their remnants are 
.evident in many of today’s evaluations in 
which evaluators serve as “judges” and gather 
quantitative data on student, teacher and 
school performance in order to draw 
conclusions about program effectiveness and 
worth. 

This role of “judge” is a necessary, 
frequent part of many evaluation plans. But as 
new theories about evaluation practice have 
evolved over the last few decades so have 
new ideas about evaluators’ roles emerged to 
expand and complement this basic function. 
Education reformers, researchers, evaluators 
and funders have debated these roles and the 
“place” of the evaluator in a reform. They 
have asked whether evaluators should be 
internal or external to programs; whether 
reporting and feedback should be summative 
or formative; and whether to use qualitative or 
quantitative methodologies. Madaus and 
Kellaghan capture this debate by stating that 

the emphasis that evaluation and 
assessment have received and the form 
they have taken at different points in 
history reflect differences in the nature of 
education and the determinants of school 
achievement, the importance of 
accountability, and the purpose of 
evaluation. (Madaus & Kellaghan, 1992, 

p. 121) 



Now, the systemic reform movement 
again stimulates the development of new 
educational theory. In turn, evaluators must 
consider that this is also a time of change for 
the evaluation enterprise. Systemic reform 
calls for new roles for project leaders, school 
administrators and teachers; it seems the role 
in the evaluator is likely to change as well. 

Evolution of Evaluation Roles 

There is little, if any, consensus as to what 
evaluators’ roles should be, whose values 
should be represented in evaluation, and what 
questions evaluators should ask (Shadish et 
al., 1991). While a program’s goals, purposes 
and context ultimately determine the answers 
to these questions, over the last forty years, 
theories of evaluation have developed which 
influence this debate. Evaluation has become 
a more prominent enterprise in the education 
endeavor, bringing with it new descriptions of 
evaluators’ roles and functions. 

In Foundations of Program Evaluation: 
Theories of Practice, for example, the authors 
describe the evolution of new ideas about 
evaluation in three stages (Shadish et al., 

1991). The first stage was rooted in a 
scientific approach to finding successful 
solutions to social problems. The second stage 
grew in the 1970s and represented an interest 
in departing from traditional practices to 
create approaches to evaluation that were 
more practical and would be of more use to 
the programs. The third stage of evaluation 
theory was focused on integrating all of the 
methodologies and strategies that had come 
before into a more “coherent” approach to 
evaluation (Shadish et al., 1991). 

Guba and Lincoln ( 1989) also developed 
a categorization scheme for evaluation. They 
describe four “generations.” The first has 



74 



71 



evaluators in the role of “technician,” in 
which they are familiar with an existing set of 
measurements and identity the most 
appropriate for the task at hand. The second 
generation places evaluators in the role of 
“describer,” in which they extend the 
measurement role to include “chronicling] of 
program strengths and weaknesses.” The third 
generation casts evaluators as “judges,” who 
assess the worth of a program, and the “fourth 
generation” evaluator is one who retains each 
of the previous roles, but adds new ones such 
as: collaborator, leamer/teacher, reality 

shaper, mediator and change agent (Guba & 
Lincoln, 1989; O’Sullivan, 1995). 

Shadish’s third stage and Guba and 
Lincoln’s fourth “generation” place the 
theoretical discussion about evaluation on a 
trajectory that seems compatible with the 
evolution of systemic reform. Just as theory of 
educational change is evolving to 
accommodate systemic approaches, so are 
discussions about evaluators’ roles evolving 
to encompass a wider range of responsibilities 
and purposes. In the early part of this decade, 
for example, Beswick wrote that the role of 
the evaluator was moving from what one 
might describe as technical roles to more 
political and advisory roles (Beswick, 1990). 
Similarly, others suggested that education 
reform efforts needed evaluators to function 
as coaches or collaborators in order to most 
effectively respond to the demands for 
accountability and impact (McColskey, 1995). 
Now, as systemic reform becomes more 
widespread, opportunities for, and interest in 
such non-conventional roles grows with it. 

Evaluators’ Roles in Systemic Reform 

Just as there is no single approach to 
implementation of systemic reform, there is 
no single model for evaluating it. There do, 
however, seem to be some common themes 
regarding evaluation roles in systemic reform 
that are likely to influence how evaluators 
develop evaluation plans and strategies. First, 
evaluators can expect to play multiple roles. 
Building from Shadish’s third stage of 
“coherence,” or Guba and Lincoln’s fourth 
generation, evaluators of systemic reform 



must have a versatility that will allow them to 
serve a variety of needs. Systemic reform is 
quite complex and involves multiple 
stakeholders. Consequently, evaluators may 
need to shift roles to best match the various 
targeted areas of study within the systemic 
reform and to best accommodate the interests 
and needs of the client at any particular time. 

Second, hand in hand with the complexity 
of systemic reform is the dynamic, fluctuating 
nature of the systemic endeavor. This suggests 
that evaluators have to do more than 
accommodate different roles at different 
times, but that they also need to move in and 
out of those roles in a flexible manner. 
Depending on how the evaluation is 
organized, individual evaluators may work 
within a clearly defined set of roles, or they 
may need to play multiple roles somewhat 
simultaneously. Every set of evaluation roles 
for a systemic reform effort will be different; 
each evaluation effort will have a somewhat 
different purpose and goal. There is no 
predicting which roles will always have to be 
played when, but it is important that those 
participating in the evaluation together have a 
palette of roles which can allow them to best 
meet the needs of the reform and fulfill the 
goals and purposes of the evaluation. 

Nearly 15 years ago, Maurice Eash 
recognized the potential for change in 
evaluation. He wrote: 

. . . our relationship with the client has 
changed drastically. ... The process has 
moved from one that was set very much 
in advance to one that requires a 
continuing interface with the client and is 
largely evolutionary. ... As for the future, 
I believe the following well-established 
trends will continue: a) close interaction 
of the evaluator and client throughout the 
life of a project using multilevel 
evaluation, b) evolving rather than fixed 
evaluation designs and c) addressing of 
design questions and findings to 
numerous interest groups by giving equal 
attention to the contextual politics 
involved as well as the technical 
demands.” (Eash, 1985, p. 252) 
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This description seems to capture some of the 
issues underlying evaluation of systemic 
reform quite well. The evaluator-client 
relationship may no longer be one that is 
strictly formal and confined to conventional 
roles. Rather, it is likely to adapt to the needs 
of the reform, the evolving purposes and goals 
for the evaluation and the specific 
client/audience at any particular time. 

The range of roles, then, that evaluators 
might be called upon to play is expanding. 

The list is long and includes: collaborator, 
documentor, critical friend, advocate, 
teammate, coach and change agent to name a 
few. Discussion of some of these roles has 
been continued over the years, but has been 
largely ignored until the emergence of 
systemic reform. Other roles are new with the 
arrival of the systemic scheme, and still others 
have attained greater significance and 
meaning in the systemic arena. 

In order to understand how these and 
other roles are necessary to address the needs 
and purposes of a systemic program, one must 
look closely at the organizing goals of 
systemic reform efforts. One can argue that 
these goals fall within two domains. One 
domain, much like conventional reforms, 
focuses on improving educational practices 
and outcomes. The second domain focuses on 
capacity building and is tied to the nature of 
systemic reform as a continuous endeavor 
(Century, 1997). Therefore, in addition to the 
structural and instructional changes that a 
mathematics or science reform puts in place 
(the first domain) a systemic reform must also 
focus on establishing policies, practices, and a 
culture and environment that will support 
continuous positive development in the future 
(the second domain). Success of systemic 
reform then, includes both, establishment and 
maintenance of new instructional changes as 
well as an enduring ability to reflect on, 
reevaluate, and improve both new and old 
practices once they are in place. 

There are three categories of roles that 
evaluators can play when responding to these 
two domains: evaluation roles-those roles 
that evaluators have played in the past but that 
take on increased importance and significance 
in the context of systemic reform; systemic 



perspective roles-those roles that are new or 
uniquely significant in the context of systemic 
reform; and technical assistance roles-roles 
that are typically played by technical 
assistants but can be appropriate for 
evaluators in the context of systemic reform 
(Century, 1997). Placement of a role in one 
specific category does exclude it from others; 
some roles may overlap into two or even all 
three categories. 

The two domains of systemic reform 
goals link directly to the traditional evaluation 
roles and the technical assistance or 
consultant roles respectively (“systemic 
perspective” roles won’t be addressed here). 
For example, an evaluator addresses the first 
domain (improvement in educational practices 
and outcomes) through judging gains in the 
education change process. Evaluators might 
ask questions such as: “Are new instructional 
materials in place?” “Has professional 
development improved?” “Do student 
assessments reflect the changes in the 
curriculum?” and “Is there improvement in 
student performance?” Evaluators identify the 
presence of changes in these various aspects 
of the educational program and the extent to 
which those changes are of high quality. Then 
they can make judgments about the success of 
the reform. 

This might be where evaluations of more 
conventional programs stop. In systemic 
reform, however, evaluators also can turn to 
the second domain of goals: capacity building. 
In doing so, they need to consider whether 
there is sufficient capacity in the system to 
sustain continuous growth. This still aligns 
with the role of judge, but brings evaluators to 
the edge of what some would consider 
unacceptable practice. If evaluators see that 
there is insufficient capacity in the system, 
they are confronted with a challenge: whether 
or not to cross the line from judging capacity 
and take on a consultant role to assist in 
building the capacity. Some evaluators have 
found that this question is answered for them. 
Reluctantly, or even perhaps unwillingly or 
unknowingly they find themselves responding 
to the needs of the client by crossing what is 
sometimes a blurry line between evaluator 
and consultant. 
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Should an Evaluator Assume the Role of 
the Consultant or Technical Assistant? 

In writing about providing assistance to 
third world nations, Harari describes the 
technical assistance expert as one who is “an 
instrument of communication between two 
worlds ... his official vocation is to foster the 
development of the country to which he is 
sent by virtue of the work he does” (Harari, 
1974). Some evaluators working in the field 
in systemic reform would concur that even 
though they don’t set out to play technical 
assistance roles that fit this description, 
project leaders and participants implicitly ask 
them to do so. For many, engaging in 
technical assistance roles is inappropriate. 

Such actions compromise what is, in their 
eyes, the necessary objectivity of the 
evaluator and the evaluator’s ability to be a 
fair judge. These concerns weigh heavily in a 
field that has worked to develop a careful set 
of methodologies and strives for the 
credibility that results from adherence to these 
methods and the standards that accompany 
them. 

And yet, as mentioned above, systemic 
reform seems to call for a redefining of roles 
of all participants. Like state and district 
administrators, teachers, and other 
stakeholders, evaluators and technical 
assistants “have been taught and have come to 
believe a set of conventions about what their 
titles mean and what actions are within and 
outside of their bounds” (Century, 1997). 

When exploring the new realm of evaluation 
of systemic reform then, they may find it 
necessary to question some of the standing 
assumptions about roles and consider the 
appropriateness of those roles in a different 
light 

The suggestion that evaluators begin to 
act as technical assistants is not new. Writing 
about utilization of evaluation findings, for 
example, Eash noted that the most significant 
factor contributing to increased use of 
evaluation findings was when “evaluation 
assumed an ‘educative’ approach to the client 
throughout the process” (Eash, 1985). More 
recently, writing specifically about 
mathematics and science curriculum 



development, O’Sullivan noted that in 
addition to more conventional evaluation 
roles, “funding agencies and other program 
sponsors are requesting that evaluators 
provide technical assistance to programs that 
may require on-going, active evaluation 
assistance through the project development 
process”(0’Sullivan, 1995).. 

Ultimately, an evaluator’s choice of role 
is influenced by many different factors 
including the general purposes and goals of 
the evaluation, the relationship with the client, 
the identified audience, as well as the personal 
and professional considerations of the 
evaluator, including experience, style, and 
training (Alkin et al., 1979). Some roles, 
however, evolve without intention, influenced 
by the emerging shape of the systemic reform 
and its evaluation. Evaluators may find that as 
the reform progresses, needs emerge which 
can be met only through assuming some 
unanticipated roles. In conventional reforms, 
evaluation responsibilities stop at directing 
attention to the needs of the project, not 
responding to them. In systemic reform 
settings, (or even in what Shadish refers to as 
policy or program settings) the evaluator may 
feel that “responsibility to a specific program, 
its staff and stakeholders often takes 
precedence over traditional role behaviors of 
scientists” (Shadish et al., 1991) and he/she 
may feel compelled to take action in a new, 
unanticipated role. 

Clearly, changes in evaluation that 
include consultant roles place evaluators 
beyond what many evaluators would consider 
acceptable boundaries between the evaluator 
and the evaluand. They call for increased 
involvement that goes beyond the limits 
outlined by some, while staying well within 
the limits set by others. There is a line to be 
drawn somewhere between the evaluation 
endeavor and the systemic reform; that line 
may fall in a different place, for different ' 
reform efforts and for different evaluators. 
While the boundary line will never be 
completely clear, evaluators need to clarify 
for themselves their best understandings of 
where that line is drawn for each role and 
circumstance. 
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Redefining Conventions of Evaluation 

Underlying much of this discussion about 
role are deeper questions about the 
implications these roles have for current 
understandings of the credibility, validity, and 
objectivity of the evaluator. Objectivity and 
credibility are typically linked in that without 
objectivity, an evaluator can not be credible. 
Similarly, the methodologies and relationships 
suggested by some of the roles described here 
threaten current understandings of the validity 
of the evaluation. The specific implications 
for each of these touchstones of evaluation are 
far too complex to address here. However, it 
is important that evaluators recognize that this 
may be a time when the field generates new 
meanings for these words, at least as they 
exist in the context of evaluation of systemic 
reform efforts. 

This paper has been adapted from chapters 
written for the NISE book, Evaluation of 
Systemic Reform in Mathematics and Science, 
which is under preparation. 
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ASSESSING STUDENT OUTCOMES 
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Overview 

Student academic achievement is often 
the main area of interest for educators and 
policy makers within any discussion of 
systemic educational reform. These 
discussions are usually centered on traditional 
test scores that may or may not reflect what is 
important for reformers and educators yet, for 
many, they are the only available mechanism 
to demonstrate the impact of an initiative. 
Finding and designing alternative ways to 
measure student academic achievement within 
the new parameters of systemic educational 
reforms has been a major challenge for both 
evaluators and reformers who have searched 
together for answers to accountability 
questions. This paper presents the evolution 
of, and the lessons learned from, a research 
approach to assessment of student outcomes, 
specifically of student academic achievement, 
being used by the Puerto Rico Statewide 
Systemic Initiative (SSI), which is one of the 
statewide systemic initiatives for science and 
mathematics sponsored by the National 
Science Foundation. 

Definition of Outcomes and Outcome 
Variables 

Weiss (1998) describes outcomes as “the 
end results of the program for the people it 
was intended to serve” (p.8) and farther 
comments that outcomes are interchangeable 
with results and effects. Outcomes are 
certainly an end result of systemic educational 
reforms as well as of many other types of 
programs, but the nature and context of these 
initiatives requires a wider definition. For 
example, in systemic educational reforms, 
outcomes can be evident at the level of the 
classroom, school, district, or state. Evaluators 
of systemic educational reforms are usually 
interested in connections between different 
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interventions and outcomes, as well as in the 
factors that contributed to the occurrence of 
those outcomes. 

Because of the additional dimensions of 
systemic educational reforms that differentiate 
these programs from other educational 
interventions, distinctions between outcome 
variables and outcomes need to be 
established. In systemic educational reform, 
an outcome variable is a quantity, dimension, 
or quality of the system subject to change 
because of the initiative. A systemic variable 
is an outcome variable that can be measured 
across the system such as student academic 
achievement in science and mathematics. In 
turn, an outcome for a systemic initiative is a 
change in an outcome variable directly 
attributable, or likely attributable, to the 
initiative, such as improvements in student 
learning as a result of participation in 
standards-based instruction in science and 
mathematics. 

Importance of Student Achievement 
Outcomes within Systemic Educational 
Reforms 

The central focus of most systemic 
educational reforms is the achievement of 
challenging academic standards that can be 
demonstrated through improvements in 
student academic achievement. Student 
academic achievement is interrelated with 
aspects of the initiatives such as their visions 
of quality education, expectations of 
performance for participants, definitions of 
equity, and designs of professional 
development interventions among others. 
Further, student academic achievement is a 
concrete indicator of progress that is 
associated with other areas of student success, 
such as college and job placement. Thus, 
systemic educational reforms are often 
expected to provide evidence of having an 
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impact on student academic achievement as 
an indicator of the value added by the 
reforms. Consequently, evaluators face the 
challenge of choosing an appropriate data 
collection and reporting design that meets the 
needs of the initiatives and of their multiple 
stakeholders. 

The Evolution of a Research Approach in 
the Assessment of Student Outcomes 

Just like many other systemic educational 
reforms in science and mathematics, the 
Puerto Rico SSI’s central focus is the student 
as an active learner (Shields, March, & 
Adelman, 1998). The Puerto Rico SSI fosters 
the holistic development of the students in 
preparation for their participation in the next 
century as illustrated in the constructivist 
-principles that guide this reform; the Puerto 
Rico SSI envisions the teaching and learning 
process as bi-directional and interactive with 
the guidance of teachers within the context of 
school environments (Davila, Vega, & 
Rodriguez, 1996). A participatory-research 
approach was selected for the evaluation and 
assessment of the Puerto Rico SSI, in general, 
and for the assessment of student academic 
achievement, in particular, because: (1) the 
philosophy that guides this initiative 
emphasizes participant empowerment and the 
development of self-sustaining communities 
of learners; (2) the size and scope of the 
initiative require the involvement of 
increasing numbers of individuals; (3) the 
Puerto Rico SSI’s reformers and participants 
possess expertise in a diversity of areas that 
can significantly contribute to the successful 
implementation of such a model; and, (4) the 
literature available at the beginning of the 
initiative’s implementation (in 1991) clearly 
demonstrated a need for results of systemic 
educational reform based on research (Davila, 
1996). 

Triangulation of results has been a major 
element of the design from the beginning of 
this reform. By comparing findings obtained 
using multiple quantitative and qualitative 
data collection strategies as suggested in the 
literature (Laguarda, Goldstein, Adelman & 
Zucker, 1998), the Puerto Rico SSI has 



identified trends and made pertinent mid- 
course corrections within its encompassing 
systemic strategy. The Puerto Rico SSI’s 
participatory research, evaluation, and 
assessment design involved all the different 
areas being addressed by this comprehensive 
science and mathematics reform (see Ddvila 
& G6mez, 1994; 1995; Davila, Gomez & 

Vega, 1996, among others, for specific 
examples). However, documenting and 
measuring student academic achievement was 
a major area of emphasis in this design 
because of (1) its importance within the larger 
context of systemic initiatives, and most 
importantly, (2) its value for the Puerto Rico 
SSI for decision-making purposes. 

First Version of the Puerto Rico SSI’s 
Model to Assess Student Academic 
Achievement 

The first version of the model consisted of 
collecting and interpreting data at three 
different levels: (1) the classroom; (2) the 
initiative; and (3) the system (see Figure 1) 
(Puerto Rico Statewide Systemic Initiative, 
1997). The description of each one of these 
levels follows. As part of their professional 
development, science and mathematics 
teachers learn to use authentic assessment 
strategies such as open-ended questions, 
performance tasks, portfolios, and multiple- 
choice questions that require higher order 
thinking skills to obtain information about 
student progress. Teachers use the results 
provided by these innovative strategies in 
their classrooms to (1) provide feedback to 
students about their performance and (2) 
modify their teaching, learning, and 
assessment practices. Teachers also translate 
these results into letter grades: schools 
provide grade distributions in terms of 
satisfactory (i.e., As, Bs, Cs) and 
unsatisfactory (i.e., Ds and Fs), before and 
after their participation in the Puerto Rico 
SSI, to identity trends in student academic 
achievement. 

The initiative’s staff developed a series of 
standards-based pre/post tests in science and 
mathematics to measure the value added by 
the systemic educational reform, as part of the 
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second level of the model. These tests 
included multiple-choice items that measure 
higher order thinking skills, open-ended 
questions, and performance tasks. Thus, 
assessment of student academic achievement 
was aligned at the classroom and initiative 
levels. Initially, all participating students took 
these assessments and, later, as the number of 
students and schools increased, representative 
samples of students were selected to represent 
their schools in the assessments. 

The third level of the model consisted of 
external indicators of student progress for the 
overall K.-12 system. The results of these tests 
provided other measures for the Puerto Rico 
SSI to “take the pulse” of the reform, even 
though they were not fully aligned with the 
standards-based reform. These indicators 
included an adaptation and translation of the 
National Assessment of Educational Progress 
(NAEP) that was administered in 1994 in both 
science and mathematics to samples of 
participating Puerto Rico SSI students (i.e., 
lower socio-economic levels), students from 
private schools (i.e., middle and upper middle 
socio-economic levels), and students from 
non-participating public schools (i.e., lower 
socio-economic levels). They also included 
other tests designed by testing corporations 
and administered by the Puerto Rico 
Department of Education, such as the SENDA 
and the Puerto Rican Competencies Test. 

The first version of the model provided 
very useful information to the Puerto Rico 
SSI. However, as the needs of the initiative 
evolved, new ways to (1) look at student 
academic achievement; (2) provide specific 
formative feedback of student academic 
achievement to multiple players and 
stakeholders; and (3) design more 
mechanisms to drive the improvement of 
student learning in science and mathematics 
were imperative. 

Second Version of the Puerto Rico SSI’s 
Model to Assess Student Academic 
Achievement 

The centerpiece of the second version of 
the model is the science and mathematics 
pre/post tests designed by the Puerto Rico 



SSI's staff in an alliance with The College 
Entrance Examination Board (CEEB), which 
provided technical expertise for their 
administration and analysis (see Figure 2) 
(Puerto Rico Statewide Systemic Initiative, 
1998). The new tests were designed to 
measure achievement gains over the course of 
one year, using publicly-released multiple- 
choice and open-ended items from NAEP 
(National Assessment of Education Progress) 
and TIMSS (Third International Mathematics 
and Science Study). The tests were 
administered to the fourth, eighth, and 
eleventh grades; all students from the 377 
Puerto Rico SSI schools participated in this 
new assessment. 

The new standards-based tests are scored 
using a scale equated with the TIMSS scale 
for item difficulty and student ability; a score 
of 500 in either scale equals the international 
average. By using a scale equivalent to that of 
the TIMSS scale, student scores can be 
compared with national and international 
benchmarks of student performance that allow 
the Puerto Rico SSI to place the progress of 
its students within the larger global context 
(see Figure 3). By sharing the schools’ results 
of the pre-tests by content area with school 
principals and teachers, the school can assume 
responsibility for improving student learning 
that can be demonstrated in the post-tests. 
Further, the results of the pre/post-tests will 
be used to refocus the initiative’s professional 
development activities, based on the content 
needs of the students. 

Another key element of the second 
version of the model is the teachers’ 
participation in parallel assessments; its main 
purpose is to identify and correct teachers’ 
weaknesses in content. Teachers receive sets 
of items not included in the tests administered 
to their students (but similar in approach and 
content) during a professional development 
session and are asked to respond to them 
anonymously. An item by item analysis of the 
distribution of their responses leads to a 
discussion of common misconceptions and of 
ways to correct them. The information 
provided by these analyses provides another 
mechanism for refocussing the initiative’s 
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professional development activities to address 
specific content needs of the teachers. 

An external criterion now included in the 
Puerto Rico SSI’s assessment of student 
academic achievement is the results of the 
college admissions tests administered by the 
CEEB. Since equating studies between the 
SAT and the CEEB mathematics tests show a 
correlation of 0.87, the Puerto Rico SSI can 
confidently compare the results of students in 
the mathematics test of the CEEB with those 
of mainland students in the mathematics test 
of the SAT (see Figure 3). 

Another external criterion is the ratios of 
college admissions to the University of Puerto 
Rico System, which is the most competitive 
university system of the Island. College 
admissions ratios of Puerto Rico SSI 
participants are being analyzed by length of 
initiative intervention (i.e., intermediate 
school only vs. intermediate and high school). 
Distributions of chosen field of studies upon 
admissions are being analyzed in a similar 
way. 

The evolution of the first and second 
versions of the Puerto Rico SSI’s model to 
assess student academic achievement as an 
outcome of systemic educational reform show 
that considerable organizational learning has 
taken place within the Puerto Rico SSI. The 
following section addresses some of the 
lessons that the leadership of this reform has 
learned in the process of designing these 
models. 

Lessons Learned 

The process of designing the two versions 
of the assessment model required intense 
reflection and thinking by the leadership of 
the Puerto Rico SSI at multiple levels. Since 
the first version of the model had provided the 
initiative with very useful information over 
the years, it was difficult at first to make the 
decision to find another way to measure 
student academic achievement. However, the 
national exposure and dissemination of the 
T1MSS reports since 1997 was certainly a 
factor that prompted us to look for other 
alternatives more in tune with the evolving 
needs of the reform. Using publicly-released 



items from NAEP and T1MSS represented a 
major cost-saving step, since the items had 
already been developed; but, without the 
vision and expertise of The College Entrance 
Examination Board (CEEB), we would not 
have achieved the same results. At the same 
time, the Puerto Rico SSI staff is influencing 
the test design vision of this major player in 
education by emphasizing and modelling the 
use of national standards to guide test design. 
Further, the involvement and engagement of 
teachers in the professional development 
exercises described above gave us pleasant 
surprises, since they sincerely enjoy the 
experience of looking at their own 
performance and, most importantly, they grow 
professionally and personally in the process. 

Final Comments and Next Steps 

One of the major challenges currently 
faced by evaluators and reformers who work 
with systemic educational reforms is the need 
for common metrics of student academic 
achievement. This is a recurrent theme in 
meetings sponsored by the National Science 
Foundation and it is evidently a high priority 
on the national educational reform agenda. 

We believe that the models presented in this 
paper can contribute to advance the design of 
such metrics. 
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The CEEB equated the TIMSS / NAEP scale with the TIMSS International scale to permit valid comparisons. 
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Scores of PR-SSI TIMSS based test were equated to international scales by the CEEB within an error of 8 points 

The performance of all eleventh grade students is compared against that of students from high schools 
whose eleventh grade students have four years of PR-SSI experience 
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Multiple Choice Math scores by school sorted by mean - 1 1th grade 
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Private schools : students from families with middle and 
upper middle income backgrounds 

Public schools : students from families with lower income 



UNDERSTANDING THE VALUE OF NSFS INVESTMENTS IN SYSTEMIC 

REFORM 

Mark St. John 
Inverness Research 



The Challenge 

I was invited by a sub-committee of 
Congress to talk to its members about the 
value of the National Science Foundation 
(NSF) investments that are being made in the 
Urban, State, and Rural Systemic Initiatives. 
The questions they asked are very simple, 
very basic: “Are they working?’ and “What 
are the benefits [in return) for the millions of 
dollars being invested?” Very simple and 
reasonable questions to ask-and yet very 
difficult to answer. My first question as an 
evaluator was: “How do you even think about 
that question? What are the kinds of benefits 
that one might reasonably expect to come out 
of the investment made in the Systemic 
Initiatives?” How do you conceptualize that 
kind of return on investment? 

So today I want to talk to you-and think 
out loud with you-about that question. As 
my friend, Patrick Shields, would say, when 
you think about evaluating systemic reforms, 
the answer to the question depends very much 
on what you mean by “evaluation,” and it 
depends very much on what you mean by 
“systemic reform.” So let me focus briefly on 
what I mean by each of those terms. 

Evaluation, in this case, refers to figuring 
out the value, the contribution, and the 
benefits that accrue from the public 
investment that is being made in the Systemic 
Initiatives. Today, I speak primarily from the 
perspective of an evaluator, a professional 
evaluator, who is trying to think about these 
issues in a substantive, scientific way. I am 
thinking primarily as an evaluator-and not as 
a science educator. That is, I am thinking 
about the serious problem of assigning value 
to an investment that is being made according 
to the “systemic theory” of improving science 
education. I am not thinking as a science 
educator trying to use evaluation as a way to 



further the cause of science education. Thus, I 
am trying to think about the issue of 
evaluation objectively and scientifically, and 
not politically. That is a different exercise. I 
am making an effort to identify the real value 
of these investments-not those values that 
may or may not have political currency. I was 
trained as a physicist. So I began to think 
about this as a physicist. What do I observe in 
the field, and in my interviews, that would 
qualify as a significant contribution and 
value? What might we infer from the evidence 
about the nature of the value and benefits that 
come from NSF’s work? What are the kinds 
of things we could really say about NSF’s 
work if there were no political pressures upon 
us? How would we think about evaluation and 
systemic reform in a clear and straightforward 
fashion? 

The Nature of Systemic Change 

Now for part two of defining my terms. 
When I refer to “systemic reform,” I have a 
fairly simple idea in mind I think systemic 
reform is quite right in its analytical insight. 
The theory of systemic reform says: The 
instruction that students receive-the quality 
of their learning experiences in schools-is 
directly shaped by the system, by the political 
and institutional context, that surrounds the 
classrooms they are in. The theory of systemic 
reform says: Any attempt to improve the 
quality of instruction that students receive 
must, to be successful, assume a systemic 
perspective in the design and implementation 
of that effort. For example, according to the 
theory of systemic reform, you cannot just 
pick one element of the system (for example, 
curriculum) and work on it, and then expect 
instruction to improve. The system that shapes 
instruction is itself composed of multiple, 
hierarchical, interacting, and complex 



systems-systems of professional 
development, assessment, curriculum, school 
governance, and others. We know that good 
instruction happens when there is a 
convergence, and alignment, of many 
necessary, but not sufficient, sub-components 
of the larger system. That is, improvement in 
each system component, such as curriculum, 
professional development, and assessment, is 
a necessary, but not sufficient improvement 
when it comes to increasing the quality of 
instruction and raising the level of student 
achievement. 

An analogy, and friends who know me are 
tired of this example: But, in the airline 
system, it is quite clear there are many 
necessary, but not sufficient, components that 
come together to give us a safe and reliable 
airline system. It is not enough to have good 
pilot training. Good pilot training is clearly 
necessary. But it is not sufficient unto itself. 
One also needs well-designed, well- 
maintained airplanes, good air traffic control, 
and good airports. All of these system 
ingredients are simultaneously required for 
safe airline operation. It may well be possible 
to greatly improve the quality of pilot training 
and yet still have an unsafe airline. Also, we 
cannot improve professional development this 
year and say that we will handle airport 
design and safety next year. All of the 
components must be simultaneously 
present-working together and of high 
quality. The real issue in all of this is that each 
of these components may well be necessary, 
but they are not sufficient. You cannot get 
seven out of eight right, and leave the eighth 
undone. Think about the airline system as a 
helpful guide in understanding how a 
systemic approach and perspective are 
necessary in effecting educational change. 

So, in education, as in other complex 
endeavors, there are many necessary yet not 
sufficient system supports that must be 
present if the system is to function well. That 
is one important notion that underlies the 
theory of systemic reform. 

Secondly, there is another aspect of 
systemic reform that is equally important — 
but is often overlooked. That notion is that the 
people who do the work of improving the 



system must be those living and working 
within the system. That is, in a systemic 
reform effort, the changer is the changee. It 
does not help very much to have external 
agents (such as university professors, no 
offense) do a lot of research in their 
laboratories, and then present the results to 
school districts. It is the people who live 
within the system (those who work at the 
state, district, and school levels) who must 

gain the capacity and have the resources to 
improve the functioning of their own systems. 
That means that they must have the capacity 
and expertise to bring about intelligent 
change. Further, this capacity must consist of 
internal skills and knowledge, as well as 

access to external resources and expertise. So 
we see that “capacity building” is also a very 

important part of the theory and work of 

systemic change. Because the changer must 
be the changee, the real work of systemic 
reform lies in building the capacity of those 
who are key change agents within the system. 
Finally, this idea has re-implications for 
helping us think about how we might evaluate 
NSF’s investment in systemic change — 
which, as you will remember, was our original 
charge. 

Issues in Evaluation 

As we look more carefully at the role of 
evaluation, let me share with you some of the 
concerns I have. There are many definitions 
of “evaluation,” but I want to propose what I 
consider a very simple one: I think evaluation 
is about helping people understand more 
clearly, and in more powerful ways, what is 
actually happening. 

Evaluation should help clarify what is 
real: When, as evaluators, we study a program 
or a curriculum, we should help to 
conceptualize, explain, and illuminate what is 
actually happening. That is a different process 
from helping NSF “make its case.” It is a little 
different from helping a state systemic 
initiative make its case. In our evaluations, we 
should be careful about the inferences that 
people are encouraged to draw from our work. 
We should be encouraging caution, and we 
should be actively discouraging false 
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inferences. We should be dispelling-and not 
creating — myths and overly simplistic 
conclusions. 

Evaluation should also help educate and 
inform the public about the realities and 
complexities of education. As evaluators, we 
should use our insights to help people think 
more powerfully about the way the system 
works, about the way learning happens. We 
should be educative. And we evaluators, who 
have a lot of experience in seeing how the 
system does and does not work, should share 
our insights with others. The power of these 
insights is that they are grounded in, and 
intertwined with, the data that we gather. In a 
way, you could say evaluation should be 
about identifying and reducing any gaps that 
may arise between rhetoric and reality, 
between mythology and actuality. That is 
what I think evaluation should be doing. 

In physics education, there are frequently 
discussions about student misconceptions — 
about the ideas students have about the way 
things work that are not at all congruent with 
reality. At this point, I want to share with you 
certain evaluation misconceptions, or perhaps 
I should call them “accountability 
misconceptions,” that people have about 
education that are not congruent with reality. 

At the risk of seeming naive: I know there 
are “political realities” that help to generate 
and perpetuate these kinds of misconceptions. 
But I nonetheless think it is in our long term 
interest to be critical of misconceptions and 
false inferences that try to wrap themselves in 
the respectable cloak of evaluation and 
accountability language. 

There is one misconception, for example, 
that is fairly prevalent. I call it: “the last input 
is the only input.” That is, it is often argued 
that we can evaluate the quality of a fourth 
grade teacher by assessing the achievement of 
the students of that fourth grade teacher, say 
in mathematics. The fact that students have 
only been in fourth grade for four months and 
have spent eight years in other classrooms or 
settings-well, we do not address that reality. 
We create an accountability and evaluation 
system that responds as if the fourth grade 
teacher were solely responsible for what those 



students know and are able to do in 
mathematics. 

Another misconception of the same ilk: 
We say, if only teacher preparation programs 
were better, then teachers would be of good 
quality. In California, teacher preparation is a 
year-and-a-half program. With little control 
over who enters the program, and relying on 
sixteen previous years of schooling in the 
disciplines, teacher preparation programs are 
somehow held solely responsible for the 
quality of beginning teachers. This kind of 
over-simplistic assignment of cause may be 
politically attractive, but it is scientifically 
very, very weak. 

A further, and closely related, 
misconception equates the absolute value of 
something to the value-added component. For 
example, there are many programs that are 
supposed to have an incremental effect on 
something-but those programs should be 
judged by the value they add, not by the 
absolute value of their products. Let me be 
concrete here. You do not judge the quality of 
a psychiatrist by the absolute level of the 
sanity of his or her patients. No. The job of 
psychiatrists is to make their patients saner 
than they would otherwise be. If absolute 
value of sanity were the criterion, it would be 
easy to be a good psychiatrist: you would 
simply start with the sanest people. So it is 
important to make the point that a good 
.psychiatrist contributes a greater sanity to the 
patients: He or she adds value, makes them 
slightly saner than they would have been 
otherwise. 

In the same sense, you should not say that 
a “good? school is a school where the students 
all have high test scores. It is exactly 
equivalent. Good schools add significant 
value to the knowledge and skills that those 
students bring to the school. I have been in 
some very prestigious schools that, I would 
argue, add very little value to the bright kids 
who come in. On the other hand, I have been, 
just recently, in some wonderful schools 
where the students are scoring in the lowest 
twentieth percentile-schools that are, I 
would argue, wonderful schools because their 
skillful, dedicated teachers are bringing very 
disadvantaged students up from zero to 



94 



96 



twenty percent. Yet, none of that is 
acknowledged in a simplistic accountability 
system. In fact, our very language is 
confusing here when we talk about “high 
performing and low performing schools”-as 
if the school itself were taking the tests. No, 
we should be focusing on schools at which 
students, on average, are performing or not 
performing, well on tests, because the 
performance of the school, as a provider of 
instruction, is not at all the same as the 
performance of the students. 

Thus, my impassioned plea here is to 
evaluators, to suggest that it is our collective 
responsibility to address these kinds of 
misconceptions and over-simplistic notions 
and, in doing so, to help hold accountability 
accountable. It is ironic that accountability 
itself-the whole movement of creating 
standards, tests, and high stakes accountability 
systems-gets a free ride. Where is the 
evidence that accountability systems, used as 
they are now being used, increase student 
learning? I would assert that it is not clear at 
all that accountability exerts a positive force 
on the system, so I argue that we need to be 
critics of unexamined and even incorrect 
notions. We need to be pushing more sensible, 
less political interpretations of how value is 
assigned to schools, to programs, and to 
systemic initiatives. 

Now, to consider one final type of 
misconception that is attractive because it 
makes the world simpler to understand. The 
only problem is that it is wrong. (Some 
philosopher once said, “ To every complex 
problem, there is a simple-and incorrect — 
answer. 11 ) This misconception lies in the area 
of “attribution.” I would call it the single 
variable, or sole-agent, problem. The 
assumption that evaluations of the NSF and 
other reform efforts seem to make sometimes 
is that the program being studied is the only 
one in town, the only one doing anything. It is 
as if everything else is static and that we are 
working in a laboratory setting where all the 
other inputs are absent and all the variables 
held constant. 

But that is far from the truth. In the very 
language that is used, you often hear 
confusion, not only about the fact that NSF is 



not a sole agent, but sometimes it is not an 
agent at all. We hear that “NSF is doing 
systemic reform” in a given city or state. But 
NSF does not itself “do” systemic reform. 

NSF funds people and programs under a 
theory of systemic reform-people and 
programs involved in a variety of activities. 
Also, while such funding may actually 
represent a lot of money for NSF, it may not 
be much money for the system in which it is 
working. Some reform activities funded by 
NSF can be one of many different things that 
are happening in a larger state and district 
reform context. If you go to big districts and 
ask about what is happening there, it might be 
a long time before anybody mentions NSF. 
NSF money, and NSF activities, are 
inevitably a piece, a small part, of the systems 
they are trying to influence. The NSF 
activities are often not “at the center of the 
state’s or district’s radar screen.” So the 
image of a state or district that is engaged in a 
systemic reform effort funded by NSF is often 
seen from what might be called an “NSF- 
centric” point of view. 

It is important to note here that NSF- 
fimded activities are NOT insignificant. They 
may well be doing important work within the 
local setting. But they are not the whole piece, 
and very often the NSF work is irresolvably 
mixed in with many, many other activities and 
reform efforts. 

Also, in its Systemic Initiatives, NSF is 
working a great distance from the classroom 
and the students. That is both its power and its 
weakness. As a metaphor, imagine that NSF 
funds and activities are like shining a light 
down a cone. In this picture, NSF-funded 
activities are strategically designed to 
influence state and district reform activities. 
And they get leverage because the state and 
district reform activities cover a larger cross- 
sectional area than the NSF activities. Further, 
those state- and district-level reform activities, 
we hope, build long-term state and district 
capacities-so that those states and districts 
can, in a sustainable way, create better 
mathematics and science programs. In turn, 
these programs can help to improve science, 
mathematics, engineering, and technology 
instruction on a broader scale. Finally, such 



instruction improves student achievement on 
an even greater scale. 

The problem is that as we shine the light 
down the cone, we encounter the inverse 
square law. The light gets inversely dimmer 
as the square of the distance down the cone 
increases, so by the time you get to the end of 
the tunnel-to the cross section where student 
achievement lies-the light is very diffused, 
and very dim. 

To make matters more difficult, this 
second cone on the graphic may represent 
another reform effort. It might, for example, 
be school restructuring. It might be an 
accelerated schools program. It might be 
reduced class size. So, at any given time, in 
any given district, five, six, ten, twenty, forty 
other lights may be shining down this cone. 
And these lights converge in very non-linear 
ways-so that the final illumination of the 
cone at the level of student achievement is 
very mixed, noisy, and undifferentiated in its 
mixture. In evaluation language, you have got 
some real attribution problems here. The more 
distant you are, the more attribution problems 
you have. 

Let me summarize some of the real 
difficulties we have in establishing the value 
of NSF’s investments in systemic reform. 

First, we have problems with the scale of the 
investment. In some of the SSIs, the NSF 
contribution is on the order of a couple of 
dollars per student per year. This is a low- 
level investment if whlat we envision is direct 
impact at the student level. (Let me add, for 
the record, that there is a paradox in this 
business. It is very hard to find the results of 
your work in improved student test scores. 

But if you are concerned with professional 
development, for example, you need to be 
focused on student learning and student 
achievement. So it does not mean that, you 
ignore the details of how students learn and 
how they are actually doing. You remain 
highly focused on it. However, it does not 
work the other way: You cannot operate in the 
reverse. You cannot use student achievement 
as a measure for assessing the success or 
effectiveness of professional development. 

The reason for this, as we have shown, is that 
there are many streams that lead into the lake 



of successful student achievement. You have 
to think about this to recognize that there is 
quite a paradox here.) 

Another analogy will reinforce this very 
important point. Let us presume that you had 
a reform view of agriculture and that you 
wanted to help increase the use of soy protein 
and decrease the dependence on beef for 
protein. So you might devise programs that 
help farmers improve the quality of their soil 
so that it is better for growing protein. If this 
program is successful, you will be able to 
provide a professional development plan to 
farmers, teaching them how to improve the 
quality of their soil. And, it would be 
important for them to have a vision of soy 
production as their ultimate goal. But it would 
be a mistake to evaluate the professional 
development aspects of the program by 
measuring the degree to which soy production 
is actually increased-especially in the short 
term. Why? Because there are many 
components that are necessary but not 
sufficient for creating an increase in soy 
production. One is the ability to develop 
appropriate soil. But one must also have the 
necessary amount of sun, water, and seed. 

Even more important, perhaps: There must be 
the demand for soy and the marketplace 
economics that makes it profitable to shift 
from beef to soy. So, we see that, it is very 
possible to have an excellent professional 
development program for farmers and yet 
realize very little immediate increase in the 
production of soy. I think an equivalent 
argument holds for the evaluation of teacher 
professional development programs on the 
basis of student achievement. 

Finally, there is the issue of the scale of 
the investment. Many NSF investments are 
very small compared to the scale of the 
system on which they are intended to impact. 
There is also a serious question about the time 
scale of the investment. When is it evident 
that systemic reforms are paying off? After 
two months, six months, one year, five years? 
Because investment in systemic change is 
about building capacity, the ultimate pay-offs 
are often delayed and very diffuse. Further, 
what is happening while one is waiting for the 
“downstream benefits” of the “upstream 
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investment” that NSF systemic investments 
represent? Remember that there are many, 
many other things happening simultaneously. 

It is a fact that nothing is standing still while 
all of this is going on: there are multiple, ever- 
changing sources of reform. So, all in all, at 
the classroom and student level; you might 
say that there is a very low signal-to-noise 
ratio-all of which makes it difficult and 
probably fundamentally impossible to 
evaluate NSF’s systemic investments by 
directly connecting them to increases in 
student achievement. 

The Real Benefits of NSF Investments in 
Systemic Reform 

When we did a study for the National 
Academy of Sciences, and when we evaluated 
the data we had gathered from many states ■ 
and districts engaged in systemic reform, 
some things began to emerge for us. We 
began to sense the probability that a particular 
NSF project would succeed. But only a 
probability. Because the success of any 
individual reform effort depended heavily on 
factors that were out of the control of those 
involved in promoting the reform, we were 
actually not very good at predicting success. 

The equation presented here illustrates 
our current thinking about some of the most 
important things determining the probability 
that a reform effort will have real impact on 
the system it is seeking to improve. That 
probability depends, we would argue, on 
some factors with which you are all familiar. 
(These are ideas and, factors that you 
reformers and evaluators work with intuitively 
every day.) In the equation, the basic idea is 
that success depends upon the capacity that 
exists within the system, on the demand for 
the reform that is being promoted, and it is 
inversely proportional to the system barriers 
that exist. 

The L in the equation stands for 
Leadership. When we visit states or districts, 
it becomes clear that the single most 
important factor is the quality, expertise, 
commitment, and political power of the 
leadership that is promoting the particular 
reform effort. Who is there to do the work? 



Are they skilled leaders? Look at the 
successful initiatives we have seen over time 
and the one common factor that unites them is 
the presence of an individual, or a core of 
individuals, who are highly skilled, both at the 
district and state level. 

" D stands for Design. Design here refers to 
the knowledge and expertise that exist within 
the reform itself. The probability of a 
systemic reform succeeding is greatly 
enhanced by the presence of well-designed 
curriculum, well-designed assessments, and 

well-designed standards. There is also an 
element of design and sophistication that one 
looks for in the way in which a reform 
initiative is planned and the way in which its 
planners conceptualize and promote an overall 
“change strategy.” 

PR1 stands for Policy and Reform 
Infrastructure. The question here is: Does this 
state, this district, have ways of doing work 

that involve system-wide changes and 
improvements? I was in an Appalachian 
district recently that had no such reform 
infrastructure at all-no concept of how to 
create district-wide change or improvement. 
The administrators were struggling simply to 
operate their district. They had no means of 
working to improve something. It was a 
foreign notion. 

$ represents Discretionary Dollars: 
Dollars that are allocated specifically for 
reforms in mathematics and science also 
increase the probability of success. 

So these factors, which are internal, are 
all about the capacity of the state or district to 
launch and implement a systemic and system- 
wide reform. But to be successful, it is 
necessary to have more than these internal 
capacities. Political and public demand for the 
reform ideas being promoted is also 
necessary. To mount a successful systemic 
initiative, you need both capacity and 
demand. You can have all of the capacity in 
the world, but if there is no demand, then you 
only have a “supply side” reform. Or, you can 
have demand for change and an external 
commitment, but no capacity within the 
system to provide it. You need both. 

Further, you face barriers that have 
nothing to do with mathematics and science 
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reform. In fact, most of the factors that 
determine the ultimate impact of mathematics 
and science reform do not come out of the 
activities conducted in the name of 
mathematics and science reform, but are a 
product of larger forces. These forces are 
" detailed in the denominator of our equation. 

For example, the scale (S) of the system plays 
a large role in shaping the probability of 
success. Big states are much harder to change 
than small states. Big districts are harder to 
work with than small districts. This is because 
large systems tend to become fragmented, and 
the fragments are often at war with each other. 
Thus, it is hard to get a coherent reform 
underway. Also, in large systems, inertia is 
simply a greater problem. 

There are several other system issues and 
events that greatly affect the progress and 
ultimate impact of NSF’s systemic reforms. ' 
For example, states and districts may be swept 
by political “cross currents” (P), with 
progressive and conservative points of view 
battling for control. Or, the district may be 
struggling with severe financial (F) problems. 
Or, generally, the state or district may be 
experiencing great instability (rapid changes 
in leadership or structure or vision), so there is 
a certain amount of “turbulence” (7) 
buffeting the system. 

And, ironically, one of the greatest threats 
to reform is the existence of other reforms. 
Many systems are currently suffering from 
what might be called “reform overload” (RO). 
Mathematics and science reforms increasingly 
have to compete with other priorities (CP ) — 
for example, literacy. 

So, what does this tell us? It suggests that 
there are areas within the system that NSF can 
influence and others that it cannot. It thus 
becomes clear that NSF is not in a position to 
DO the work of reform. At best, I would 
argue that the role of the National Science 
Foundation, the role of federal investment, is 
to increase the probability that a state or 
district will continue to improve its own 
mathematics and science programs. 

I suggest that the appropriate measure of 
the value of NSF’s investments in systemic 
reform lies in assessing the degree to which 
those investments have increased the enduring 



capacity of states, districts, and schools to 
design, initiate, and sustain high quality 
improvement efforts in mathematics, science, 
and technology. 

To respond to members of Congress who 
ask me how we should evaluate the NSF’s 
investments, I would say the following: When 
NSF funds a five-year systemic initiative, and 
when the NSF-funded work has been 
completed, then that state or district should 
have enhanced capacity to continue the 
improvement of its mathematics and science 
programs. This capacity I am speaking of is 
not abstract: It consists of those factors that 
are listed in the numerator of our equation. 

For example, systemic initiatives should 
definitely leave in place leaders who are 
highly knowledgeable and skilled in 
mathematics and science reform. The vision 
of what good mathematics and good science is “ 
should be more sophisticated as a result of the 
initiative. Finally, there should also be an 
increase in the degree to which the state or 
district addresses curriculum and professional 
development issues with well-designed 
programs. The state or district should, in the 
long run, be better connected to multiple 
resources and multiple sources of expertise. 

Thus, NSF’s investments can and 
should be evaluated by assessing the degree to 
which they build the state or district capacity 
for initiating, and sustaining, further reform. 
NSF should not, I would argue, be held 
accountable for what the state or district does 
with that capacity. That is not the federal role, 
and it is beyond the control of NSF. But I do 
think that NSF should be held accountable for 
the capacity building that is a central goal of 
systemic reform. Measures of capacity are not 
impossible to design, and evaluators, I would 
argue, are capable of assessing the degree to 

which the capacity of a system is, or is not, 

increased over several years. By putting the 
emphasis on capacity, it could help to sharpen 

the focus of NSF’s systemic initiatives. By 
contrast, if we insist on assessing the value of 
NSF’s work by assessing student outcomes, 
then such evaluations will do much to-confuse 

all involved and, ultimately, blunt the 
effectiveness of the work of their initiative. 
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In conclusion, I would add one final 
thought. Even in terms of politics and the 
political considerations that are always a 
factor, there is a fundamental wisdom in 
focusing on the capacity- building nature of 
these initiatives. I would point out that it is a 
very dangerous political strategy to design 
evaluations to serve short-term political goals. 
To argue that NSF’s systemic initiatives are in 
themselves directly causing increased student 
achievement in the short term is to stretch the 



trust and credibility that people are willing to 
give to evaluation claims. I think it is a wiser 
long-term political strategy to document the 
contributions the systemic initiatives are, in 
fact, making. It is a better long-term strategy 
to tell the truth about what these initiatives 
can and cannot do, and then document well 
their real benefits to the states and districts 
they are serving. That is, for me, a more 
grounded and satisfying way to go. 
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THE EVALUATION OF NSF’S INVESTMENTS IN 
SYSTEMIC REFORM INITIATIVES 
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•THE SYSTEM MUST BE PRIMARILY BE IMPROVED BY THOSE LIVING 
AND WORKING WITHIN THE SYSTEM. 



THE EVALUATION OF NSF'S INVESTMENTS IN 
SYSTEMIC REFORM INITIATIVES 
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Introduction to Breakout Session II Question S ummar y 

Each panel was followed by a Breakout Session. Participants 
were assigned to small groups often to twelve, led by a 
facilitator, in a discussion of three questions and other issues 
raised by the presenters. Each set of three questions was 
developed by the organizers of the Forum. At the beginning of 
the Breakout Session, participants were asked to write their 
responses to each of the three questions on index cards. The 
comments that the participants wrote were used to begin the 
small group discussions. The index cards were given to two 
people, who provided a synthesis of the conference; comments 
on the index cards were incorporated into their comments. 
Responses to the first question are summarized here to 
provide examples of participants’ comments. 
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Breakout Session II: Successful 
Strategies for Evaluating Systemic 
Reform 

Participants’ Comments: 

Q: What are the essential elements of an 
evaluation of systemic reform? 

The 124 participants who responded to 
this question identified a wide range of 
approaches for evaluating systemic reform. 

The grouping of responses given below are 
listed in order, from issues raised most 
frequently by participants to those raised less 
frequently. 

1 . Attend to the specific components and 
indicators of the system and its reform — 
including forces for change, goals, 
visions, policy, and accountability. 
Twenty-eight percent of the participants 
who responded emphasized the 
importance of measuring and tracking 
specific components of the system. For 
example, one respondent wrote, “ Need to 
determine impact on curriculum, 
professional development, assessment, 
resources, policies, and student 
outcomes.” Another respondent wrote, 
“Define/determine all system 
components. Assess gains (or lack 
thereof) in all components of system.” 

2. Measure student learning and 
outcomes. One-fourth of those who 
responded to the Session II question noted 
that the evaluator needs to assess student 
learning. Some of these respondents 
indicated that an evaluation of systemic 
reform also should help to identify what 
constitutes effective measures of student 
success and achievement. As one 
respondent wrote, “Student achievement 
gains mean nothing if the measurement 
isn’t measuring what you really want 
students to learn. ... We need to study 
that part of the system.” Some of the 
responses specifically stated that the focus 
on student achievement was in contrast to 



St. John’s perspective. In his talk, St. John 
stated that reform of systems is so 
complex and has to be sustained over 
such a long time that to detect changes in 
student achievement after only two or 
three years is premature. At best, 
evaluations can address improvement in 
the system’s capacity, the degree to which 
efforts are sustainable, and the issue of 
whether the system is moving on a 
trajectory toward significant change. 

3 . Evaluate the comprehensiveness and 
coherence of the system and its reform, as 
well as the interconnections among the 
components. Eighteen percent of the 
respondents indicated that the evaluation 
should consider the system as a whole, in 
contrast to responses that focused on 
evaluating specific components (see 
Response 1. above). Some of the 
respondents indicated that essential to 
systemic reform is coherence within the 
system. For example, one respondent 
wrote, “There must be coherence in the 
form of the vision and personal 
understanding to lead the evolution [of 
systemic change]. ...” Respondents 
raised the importance of studying the 
interactions and relationship among 
system components as part of evaluating 
the system as a whole. One respondent 
replied, “Look at the whole system and 
system interactions rather than individual 
components. . . .[Examine] process 
variables that might relate to standards, 
curriculum, instruction, ...” Another 
respondent felt that the evaluation of 
systemic reform “must address multiple 
levels, while acknowledging the 
transactional and synergistic effects 
between the components.” 

4. Evaluate the processes, means, and 
conditions needed to attain systemic 
reform. Seventeen percent of the 
respondents indicated that it was 
important for an evaluation of systemic 
reform to consider and measure those key 
attributes of a system that are related to 
systemic reform, such as improvement of 
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capacity and alignment. Some of these 
respondents phrased their response by 
looking at pressure points considered 
important for changing the system. One 
participant agreed with St. John’s 
comments, “Capacity building/ 
infrastructure development is much more 
consistent with the immediate scope of 
the [systemic initiatives] than student 
achievement. To grab at change in student 
achievement as artifacts of SI is dishonest 
at best.” Another respondent felt that the 
main goal for evaluation should be “the 
extent to which a district has been helped 
to improve/sustain its own ‘reform’ 
efforts” 

5 . Study beliefs, roles, philosophies, and 
buy-in of key actors and stakeholders. 
Fourteen percent of the respondents made 
some comment stressing the need to look 
at the leaders, stakeholders, teaches, and 
other key actors. For evaluators to do 
evaluation of systemic reform, they need 
to understand what these people believe 
about reform and the vision for reform 
and what ownership they have in the 
reform process. It is also important for the 
evaluator to be sensitive to the fact that 
stakeholders play multiple roles, or serve 
multiple constituencies. One respondent 
noted the importance of determining 
whether there is a broad base of support 
among the key stakeholders. Another 
respondent said it was important for an 
evaluation to differentiate between what 
stakeholders say and what they believe. A 
third respondent thought that an 
evaluation had to consider the belief 
system of teachers and how their beliefs 
relate to their practices. 

6. Focus on change over time and the 
critical indicators that mark change. 
Fourteen percent of the respondents 
indicated that an evaluation should 
consider progress over time. In order to 



do this, it was essential to have baseline 
information and evidence of how efforts 
have been sustained over time. One 
respondent commented that it was 
essential to collect good base-line data 
about nearly all aspects of the system and 
that the evaluation should be designed to 
monitor the progression of key elements. 
Another respondent reported as an 
essential element, “Measuring the change 
in influence of reform on the school 
system in the area of policy, curriculum, 
realignment, standards-based instruction, 

I) 

Respondents offered other comments on 
what they regarded as the essential elements 
of the evaluation of systemic reform. 

However, no more than 10% of the 
respondents agreed on any one essential 
element for evaluation of systemic reform in 
these remaining comments. Some of the 
respondents emphasized that an evaluation 
should be based on a model, or driven by a 
theory of systemic reform, perhaps using 
mapping to locate important functions within 
the system. Some respondents advised 
evaluators of systemic reform to look at 
system outcomes other than student learning, 
such as those related to professional 
development and change in teaching 
practices. A few comments cautioned those 
doing evaluation of systemic reform to 
consider the time frame for changing school 
systems and to think about what it is feasible 
and reasonable to do within a given time 
frame. Three or four participants mentioned 
the important role that evaluation can serve by 
providing feedback to the system undergoing 
reform and the need for evaluators to consider 
the different audiences for the evaluation in 
determining what evidence is gathered and 
reported. Finally, two participants offered a 
reminder of how important it is that different 
evaluators verify findings and valid measures 
be used. 
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Panel III: Findings on Systemic Reform from Evaluation and Research 



Panel Papers and Authors: 

Discovering from Discovery; The Evaluation of Ohio’s Systemic Initiative 
Jane Butler Kahle 

Evaluative Findings on Systemic Reform: Lessons Learned from NSF 
Daryl E. Chubin 
Value-Added Indicators 
Robert H. Meyer 

Quantitative and Qualitative Data in the Theory of Systemic Reform 
William H. Clune 



Discussion Summary and Commentary: 
Findings on Systemic Reform from 
Evaluation and Research 

Nor-man L. Webb 

Panel III speakers presented examples of 
effective systemic reform efforts and 
described specific evaluation and research 
strategies or practices utilized during the 
evolution of successful systemic reform 
programs. Presenters’ perspectives were 
framed on one hand by the analysis of one 
state’s experience in evaluating its SSI and on 
the other by the findings of the National 
Science Foundation during a decade of 
supporting systemic change in K-12 
mathematics and science. Two presenters 
focussed on data management and analysis 
issues that have proved productive in 
evaluating SSIs. 

Jane Butler Kahle, professor of science 
evaluation at Miami University-Ohio, has 
been involved both in "doing reform and 
assessing" it. Former director of Ohio’s SSI, 
she became the principal investigator for 
Project Discovery, which was originally 
designed to impact middle school science and 
mathematics education, with a primary focus 
on the professional development of teachers. 
Support of Discovery was provided from both 
the state and federal levels, enabling the state 
to assess changes over a period of five years. 

Structurally, a three-tier nested research 
design was used which yielded different, yet 
important data at each of three levels: 



questionnaires to a random selection of 
teachers at the state level; annual visits to 
randomly-selected schools at the district level 
to. validate questionnaire responses; and 
intensive case studies in five schools (three 
urban, one small town, and one suburban) 
focused on equity and school readiness for 
reform. Two sets of findings of the Discovery 
Project proved to have major policy 
implications. First, Ohio found compelling 
evidence that its sustained professional 
development program significantly changed 
teaching practices. The average Discovery 
participant reported increased use of 
standards-based teaching; follow-up 
questionnaires indicated that these changes 
were sustained for several years, and that they 
affected the culture of professional 
development throughout the state. Second, 
improved mathematics and science scores 
were achieved by students taught by teachers 
who had completed Discovery’s professional 
development program; African Americans 
and white students scored higher than their 
peers in non-Discovery classes. A third major 
finding in Ohio’s project was the importance 
of effective communication of findings by 
publishing and widely distributing easy-to- 
understand charts of results. While mainly 
quantitative findings were presented, an effort 
was made to present data in formats the public 
and the legislature could assimilate. 

Kahle made three major points regarding 
the evaluation of the Ohio systemic initiative: 
First, it is important to evaluate the reform 
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while it is occurring, creating a data-driven 
reform. Ohio found that on-going, continuous 
evaluation of Discovery attracted the support 
of legislators and of the parents, teachers, and 
administrators needed to sustain the effort. 
Second, evaluation of complex systems must 
include both quantitative and qualitative data. 
Third, evaluations intended to guide or 
accelerate reform are only as effective as their 
means of communication. 

Daryl Chubin, National Science 
Foundation (NSF) provided a sponsor’s 
perspective on the Foundation’s efforts to 
stimulate systemic reform in mathematics and 
science in American public education. His 
paper focused on a rigorous examination of 
three issues as their importance became 
defined during NSF’s on-going experience 
with systemic reform: program evaluation as 
an accountability tool; the measurement 
challenge; and, quality of information. In 
confronting the challenge of developing 
accountability tools, NSF in 1996 produced its 
Instrument for Annual Report of Progress in 
Systemic Reform, which identified “drivers” 
that codify the principal dimensions of 
planning, activities, and reflection in systemic 
initiatives. The measurement challenge 
reflects the complexity of the systems with 
which NSF is partnered in its systemic 
initiative programs. Because effective, 
system-wide reform is a complex, nuanced, 
and uncertain endeavor, with variables that 
are often specific to the system in transition, 
no “detached” external measure can produce 
the insights that careful self-reporting yields. 
The quality of NSF investment and its payoff 
in systemic change are a function of the 
quality of information obtained from SSI 
projects. A combination of methods to gauge 
progress includes annual reviews, site visits, 
and performance effectiveness reviews. 

Special events such as field hearings also 
produce vital feedback, commentary, and 
guidance. 

Robert Meyer, research scientist at the 
Wisconsin Center for Education Research and 
the Harris Graduate School of Public Policy at 
the University of Chicago, discussed the 
weaknesses of the most commonly used 
educational indicators and the advantages of 
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employing value-added indicators in the 
analysis of educational change. After 
connecting specific measurement flaws and 
defects to typical indicators, he stated his 
belief that to have indicators that are 
appropriate for accountability and/or 
evaluation purposes, systems need to design 
indicator/evaluator systems based on the 
value-added approach. The key challenge is to 
isolate the contribution of schools to growth 
in student achievement from all other sources 
of student achievement over a given time 
period. Following a critique of the “average 
test score” as a valid measurement of change, 
he presented a theoretical simulation. He then 
proceeded to an analysis of NAEP’s 1973- 
1986 study of 11” grade mathematics scores, 
showing gains in academic achievement based 
on average test scores-whereas an analysis 
of the ' data based on a gain indicator similar to 
but not the same as a value-added indicator 
suggests the opposite. A basic problem is that 
NAEP data do not permit value-added 
analysis, since the same students are not 
sampled for two consecutive NAEP surveys. 

Thus, a major question is whether value- 
added indicators can be used as the foundation 
for school district, state, and national 
performance indicator/accountability systems. 
There are reasons for optimism because 
researchers/evaluators have been applying 
value-added models in education and training 
programs for three decades; and some districts 
and states have successfully implemented 
value-added indicator systems. The value- 
added approach to measuring school 
performance relies on a statistical model to 
identify the distinct contributions made by 
schools to growth in student achievement. In 
conclusion, he indicated that four factors 
determine the quality of value-added 
indicators: testing frequency; the quality and 
appropriateness of the tests; adequacy of 
control variables included in the statistical 
models; and, the technical validity of the 
statistical models used to create the indicators. 
To implement a value-added system, states 
and schools need to consider testing students 
at every grade level, including summer school 
and in-migrating students; it is important that 
states make it a major priority to collect 
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extensive, reliable data on student/family 
characteristics and that they develop tests that 
are technically sound and aligned with 
educational goals. 

William H. Clune, professor of law at the 
University of Wisconsin and project co- 
director at the Wisconsin Center for 
Education Research, focussed on one major 
methodological aspect of using quantitative 
and qualitative date in evaluating systemic 
reform. Using a theory developed by a NISE 
team, he demonstrated the usefulness and 
limits of quantitative ratings as a tool for 
understanding systemic reform. Based on nine 
SSI case studies published by SRI in March, 
1998, he rated the states on the basis of four 
variables-each measured by breadth and 
depth-that are essential aspects of systemic 
change: reform, policy, curriculum, and 
achievement. Attributes of successful reform 
included vision, strategic planning, 
networking with policy makers and with 
professionals, institutionalization of the 
reform structure, leveraging of resources, and 
public outreach and visibility. The reform was 
considered broad if it included all of these 
elements and the elements touched all levels 
of policy, and deep to the extent that each 
element was well developed and influential. 

He explained that to test the theory of 
systemic reform, it was necessary to 
determine whether higher levels of reform and 
policy do, in fact, produce change in teaching 
and learning. Low ratings in reform and 
policy indicated, however, that as a group the 
SSIs fell short of achieving the ultimate goal 
of transforming entire states. 

Clune developed several points regarding 
the usefulness of quantitative date in making 



generalizations about reform, pointing out, 
however, that qualitative information is 
needed as a means of interpreting the 
numbers. A common substantive problem was 
the pedagogical orientation of reform-its 
emphasis not simply on teaching, but on 
active learning. Direct means of influencing 
curriculum were relatively rare, especially 
early in the reform process. The gap between 
pedagogy and content narrowed as reform 
progressed, partly, he noted, as a result of 
prodding by NSF. But few of the SSIs began 
with changes in course content and pedagogy 
embedded in their development design. The 
best evidence of curriculum change has come 
from data on teacher training, surveys of 
teacher attitude and practice, and some cases 
of whole-school restructuring. Early in the 
reform effort, classroom change was not a 
clear objective of policy; many SSIs were 
built around professional development, with 
teacher capacity as the goal, rather than 
curriculum upgrade. 

Briefly discussing the limits of usefulness 
for numbers, he noted that many 
generalizations about systemic reform require 
purely qualitative analysis. He concluded with 
the statement that no other technique seems as 
capable as numerical ratings of testing the 
basic hypothesis that strength in one variable 
produces strength in the next, summarizing 
the overall progress of individual reforms and 
the reform effort, analyzing the status of 
reform components across sites; but that many 
data in a quantitative analysis are the result of 
qualitative inquiry and that many important 
patterns across reforms can only be 
recognized and understood as a result of 
thoughtful qualitative inquiry. 
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DISCOVERING FROM DISCOVERY : THE EVALUATION 
OF OHIO’S SYSTEMIC INITIATIVE 



Jane Butler Kahle 
Miami University-Ohio 

Expand the Discovery Project. This National Science Foundation-funded initiative started in 1991 
as a project devoted to improving middle school science and mathematics education. The primary 
focus of the program has been on teacher professional development. In 1997, this successful 
program received appropriations of $2.5 million per year from the state budget and the mission 
was expanded to improving math and science education from 

elementary through graduate school. The Taft Administration will expand this program and 
refocus it on the elementary and middle school years. 

(Bob Taft, Ohio Governor-Elect, September 2, 1998, p. 29). 



Background 

Discovery began in 1991, when Ohio was 
among the first cohort of states to receive 
Statewide Systemic Initiative awards from the 
National Science Foundation (NSF). During 
the past decade, we have simultaneously been 
doing reform and assessing it. Although other 
papers detail specific findings (see, for 
example, Kahle, 1997; Boone, 1998; Carnes, 
1998; and Damnjanovic, 1998), this one will 
focus on some findings that have had major 
policy implications. 

Discovery has been fortunate in two ways: 
first, Ohio’s General Assembly continued to 
support Discovery after the period of NSF 
funding; and, second, NSF funded a new 
project, Bridging the Gap: Equity in Systemic 
Reform, that continues the evaluation of 
systemic reform in Ohio. Therefore, we have 
been able to assess changes over five years 
and to use our findings to guide and accelerate 
the reform of mathematics and science 
education in Ohio. Further, we anticipate that 
the catalysts and barriers identified in Ohio 
will be common to many systemic efforts and 
will contribute to the knowledge base about 
systemic reform. 

Three years into Ohio’s reform, we began 
to assess its progress. A three-tier, nested 
research design has been used, which yields 
different, yet important, data at each of three 
levels. At the state level, we have used 
questionnaires with a random sample of 



teachers and administrators in over 100 
schools to provide evidence of changes in 
teaching practice, in administrative support, 
and in teacher expectations. At the district 
level, annually we have visited from 12 to 16 
schools that are part of the larger random 
sample. Our observations over several days 
have validated questionnaire responses and 
have allowed us to place the quantitative data 
in context. In addition, student achievement 
and attitudinal data have been collected in the 
schools visited. Simultaneously, we have been 
conducting intensive case studies in five 
schools (three urban, one small town, and one 
suburban). The case studies are focused on 
equity and on how systemic reform works in 
schools that are at different stages of readiness 
for reform. They are providing information 
about opportunities to learn as well as about 
catalysts and barriers to reform. 

Discoveries About the Evaluation of 
Systemic Reform 

For this paper, I have identified three 
major points gleaned from the evaluation of 
one systemic initiative. First, it is important to 
evaluate the reform while it is occurring. In 
that way, policies and practices may be 
influenced by the findings, and the reform 
becomes data-driven. In addition, the on- 
going and continuous evaluation of Discovery 
has helped it attract the support that is needed 
from state legislators and governors and, more 



importantly, from parents, teachers, and 
administrators to sustain a reform. Second, 
evaluation of complex systems must include 
both quantitative and qualitative data. And, 
third, evaluations intended to guide or 
accelerate the reform process are only as 
effective as their means of communication. 
These points are illustrated through the 
findings presented below. 

Discovery’s Most Powerful Findings 

Although Discovery and Bridging have 
assessed multiple aspects of Ohio’s reform, 
two sets of findings have had a major impact 
on the policies and practices of the reform. 
First, there is compelling evidence that 
Discovery’s sustained professional 
development (six week, summer, content 
institutes, taught by inquiry, followed by six 
academic year seminars on equity, 
assessment, and pedagogy) have changed 
teaching practices. Teachers completed 
questionnaires regarding the nature of their 
teaching before they began their summer 
professional development and in the spring of 
the following year for three years. The items 
reflected a range of standards-based teaching 
practices (e.g., working in small groups; doing 
inquiry activities, making conjectures, and 
exploring alternative ways to solve a 
problem). The average participant, in both 
mathematics and science, reported an increase 
in the use of standards-based teaching 
practices after participation in the SSI’s 
professional development program, and 
follow-up questionnaires indicated that those 
changes were sustained for several years 
(Supovitz, 1996). These findings have been 
corroborated by classroom observations, 
teacher and student interviews, and by teacher 
and student portfolios. Other evidence 
indicates that they have affected the culture of 
professional development in Ohio as more 
districts offer or reward long-term experiences 

The second set of important findings 
involves student achievement. Because it is 
difficult, if not impossible, to establish 
causality in a complex, multi-year reform 
effort, we have analyzed student achievement 
data on Discovery’s Inquiry Tests in several 



ways, and we have sought other types of 
achievement data. The intent is to identify 
patterns or trends that suggest that any change 
is more than a chance phenomenon. The 
examples below illustrate several strategies 
used. 

In one analysis of student achievement, 
socio-economic level of the students was 
controlled; while, in another one, possible 
bias in the teacher group was controlled. 
Student achievement data in both studies 
indicated improved learning by African 
American and White students who were 
taught by teachers who had participated in 
Discovery’s sustained professional 
development. For example, a comparison of 
610 science students in matched science 
classes (e.g. seventh grade life science) 
indicated that both African American girls 
and boys in classes taught by Discovery 
teachers scored 9% higher on the Discovery 
Inquiry Test in science than did their peers in 
the matched classes. In addition, White girls 
in Discovery classes scored 10% higher, and 
White boys scored 4% higher than their peers 
in non -Discovery classes (Damnjanovic, 

1998). Similar results have been found in 
mathematics classes (Goodell, 1998). 

Other analyses have controlled for teacher 
motivation, or the “volunteer” effect, by 
comparing the achievement of students whose 
teachers have volunteered to participate, but 
have not done so, with that of students whose 
teachers have completed the professional 
development. (See Corcoran, Shields, & 
Zucker, 1998, for a discussion of this issue). 
The positive effect of the Discovery 
professional development is suggested by 
higher scores (from 2% to 7%) on both the 
mathematics and science Inquiry Tests by 
students (N = 2374) whose teachers had 
completed the Discovery programs, compared 
to those who had not (Supovitz, 1996). 

Independent analyses have established 
that the gender gaps in both mathematics and 
science have been decreased both across and 
within racial groups (Damnjanovic, 1998; 
Goodell, 1998). Analyses, using the whole 
data set, indicate that the achievement gap 
between African American and White 
students (favoring Whites) has narrowed but 
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persists. However, those analyses do not 
necessarily involve African American and 
White students who are in the same 
classroom. Therefore, we analyzed the 
achievement of African American and White 
students in the same classes by using only 
classes in the sample that had at least 25% of 
their students in a minority group (either 25% 
African American or 25% White students). 
Although many classes did not fit that profile, 
we had a representative sample (comparable 
numbers of classes taught by Discovery and 
non-Discovery teachers) for three years in 
mathematics. One hundred and eight classes 
were involved, enrolling over 3000 students. 
The findings show a narrowing of the 
achievement gap (favoring Whites) in 
mathematics in classes taught by Discovery 
teachers (from 10.4 percentage points in 1995 
to 7.5 in 1997 ), and a widening of the gap 
(from 7.3 percentage points in 1995 to 15.1 in 
1997) in classes whose teachers had not 
participated in the sustained professional 
development. 

Two new types of achievement data have 
been obtained. First, in 1998, we were able to 
obtain Ohio Proficiency Test (OPT) mean 
scores for 1997 and 1998 for schools in 
several large urban districts. Because those 
scores are reported publicly by pass/fail rates 
only, they have not been useful in the past. 
Two criteria were used to select the 
middle/junior high schools in each district for 
the analysis. They were: over 70% African 
American students and over 55% students 
eligible for free or reduced-price lunch. All 
schools in an urban district that met those 
criteria were included in the analysis, and 
each district was analyzed separately. 

Analyses were run for whole districts in order 
to explore any effect of a “critical mass” of 
Discovery teachers on OPT scores in 
mathematics and science. “Critical mass” was 
operationally defined as over 5 1% (full time 
equivalent) of the science and mathematics 
teachers in the school had participated in a 
Discovery professional development program. 
Over 13,000 seventh through ninth grade 
students attended the schools that were used 
in the analysis. We identified two patterns. In 
districts where policies were aligned with 



reform practices, OPT scores in mathematics 
and science rose with the percent of Discovery 
teachers, while in districts with little policy 
alignment the percent of Discovery teachers 
did not affect OPT scores. For example, in 
high alignment districts, OPT scores 
improved 17.5% in mathematics and 9.2% in 
science in schools with more than 5 1% 

Discovery teachers. On the other hand, in 
those same districts OPT scores declined 
11.3% in mathematics and 3.3% in science in 
schools with fewer than 25% Discovery 
teachers. In schools with little alignment 
among state, district and Discovery efforts, 
there was little variation among the scores of 
students by percentage of teachers who had 
participated in the professional development. 

We were able to interpret and place these 
findings in context because of the extensive 
amount of qualitative data we had collected in " 
the cooperating districts. 

Not all of our attempts to obtain evidence 
concerning achievement have been successful. 

In 1998, we explored the use of performance 
assessments by implementing performance 
tasks from the Third International 
Mathematics and Science Study (TIMSS) in 
selected schools (student N = 500). In 
addition, multiple choice versions of selected 
TIMSS’ tasks were added to the Discovery 
Inquiry Test. For one school, student 
responses (N = 65) on the TIMSS multiple 
choice items were compared to their 
responses on the TIMSS performance tasks. 
Initial analysis of the data suggests that paper 
and pencil tasks alone inadequately measure 
student understanding, particularly the 
understanding of urban, African American 
students (Kelly & Kahle, 1998). For example, 
for those students who responded to both 
types of items (multiple choice and 
performance), 86% were able to identify 
patterns in data when they had collected the 
data and drawn the graph (performance task). 
Only 8% were able to correctly identify 
patterns when the data were presented in the 
paper and pencil test. However, expense as 
well as unresolved technical problems in both 
delivery and scoring have prohibited the 
continued use of performance items. 
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My third major point was the importance 
of effective communication. Annually, our 
findings are published and widely 
disseminated in the Pocket Panorama. To 
date we have communicated mainly 
quantitative findings, learning how to present 
data in easy to understand ways and how to 
reach key legislators. The next challenge is to 
learn to communicate succinctly and clearly 
the complex stories that are emerging from 
our case studies. 

Summary 

Analyzing data in multiple ways has 
allowed us to tell a convincing story-a story 
that has led to substantive changes. First, the 
culture of professional development has 
changed in Ohio, with long-term, substantive 
programs preferred or mandated. Second, in - 
order to accelerate improved student 
achievement, we have moved from teachers to 
schools as the unit of change. Variations of 
the content institutes are taught now at the 
district level, and Discovery’s new institute 
for principals is in demand across the state. 
Evaluations that occur concurrent with 
reform, that collect multiple types of data, and 
that effectively communicate their findings 
with broad audiences can shape a reform. 

The preparation of this paper was funded in 
part by a grant from the National Science 
Foundation, Grant #REC 9602137 (J. B. 

Kahle, Principal Investigator) and by National 
Science Foundation Grant #OSR-92500 (J. B. 
Kahle and K. G. Wilson, Co-Principal 
Investigators). The opinions expressed are 
those of the authors and do not necessarily 
reflect the position of the National Science 
Foundation. 
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Bridging the Gap: Equity in Systemic Reform 

Predicted Student Performance on Discovery’s 
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♦Controlled forurbanicity, concentration of poverty and grade level. 
Source.* Supovitz, 1996, February 
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Bridging the Gap: Equity in Systemic Reform 

Changes in Percent of Students Passing Ohio Proficiency Tests 
(Grade 8) in Cities with Aligned Reform Policies 
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♦The average percent for all schools in each cluster is based on Ohio Department of Education data. 



Bridging the Gap: Equity in Systemic Reform 

Changes in Percent of Students Passing Ohio Proficiency Tests 
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*The average percent for all schools in each cluster is based on Ohio Department of Education data. 



